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NOVEL HEDGEHOG-DERIVED POLYPEPTIDES 
BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

This invention relates generally to the field of protein processing and protein signalling 
5 pathways and specifically to two novel proteins having distinct activities, which are 
derived from a common hedgehog protein precursor. 

2. Description of the Related Art 

Embryologists have long performed experimental manipulations that reveal the striking 
abilities of certain structures in vertebrate embryos to impose pattern upon surrounding 

10 tissues. Speculation on --the^mechanisms underlying these patterning effects usually 
centers on the secretion of signaling molecule that elicits an appropriate response from 
the tissues begin patterned. More recent work aimed at the identification of such 
signaling molecules implicates secreted proteins encoded by individual members of a 
small number of gene families. One such family of proteins which may have an 

15 influential effect upon patterning activities are those proteins encoded by the hedgehog 
gene family. 

The hedgehog (hh) gene was initially identified based on its requirement for normal 
segmental patterning in Drosophila (Nusslein-Volhard, C. & Wieschaus ; E, Nature 
287:795-801, 1980). Its functions include local signaling to coordinate the identities of 

20 adjacent cells within early embryonic segments (Hooper, J.E., & Scott, M.P. Early 
Embryonic Development of Animals, pp. 1-48, 1992) and a later function in cuticle 
patterning that extends across many cell diameters (Heernskerk, J. & DiNardo, S., Cell, 
76:449-460, 1994). The hh gene also functions in the patterning of imaginal precursors 
of adult structures, includmg.the appendages and the eye (Mohler, J. Genetics, 120:1061- 

25 1072, 1988; M^etal, Cell, 75:927-938, 1993; Heberlein, etai, Cell 75:913-926, 1993; 

, Tabata, T. & Kornberg, T.D., Cell, 76:89-1 02, 1992; Basler, K. & Struhh G.. Nature, 
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368 :208-214, 1994). Genetic and molecular evidence indicates that hedgehog proteins 
are secreted and function in extracellular signaling (Mohler. J., supra: Lee, et al., Cell, 
71:33-50, 1992; Taylor, et aL Meek Dev., 42:89-96, 1993). 

In vertebrates activities encoded by hh homologues have been implicated in ante- 
5 rior/posterior patterning of the limb (Riddle, et aL, Cell, 75:1401-1416, 1993; Chang, et 
aL, Development, 120:3339, 1994), and in dorsal/ventral patterning of the neural tube 
(Echelard, et aL, Cell, 75:1417-1430, 1993; Krauss, et aL, Cell, 75:1431-1444, 1993; 
Roelink, etaL t Cell, 76:761-775, 1994). 

The vertebrate ventral midbrain contains neurons whose degeneration or abnormal 
10 function are linked to a number of diseases, including Parkinson's disease and 
schizophrenia. It is known that motor neurons develop in close proximity to the floor 
plate in the ventral midbrain. Midbrain projections to the striatum are involved in the 
control of voluntary movement (Bjorklund and Lindvall, In: Handbook of Chemical 
Neuroanatomy, eds., Borklund, et al., Amsterdam: Elsevier, pp55-122, 1984) and loss 
15 of these neurons results in the motor disorders of Parkinsons disease (Hirsch, et aL, 
Nature, 334*345, 1988). Midbrain dopaminergic neurons that innervate limbic structures 
and the cortex influence emotional and cognitive behavior, respectively, and abnormal 
function of these neurons has been associated with schizophrenia and drug addiction 
(Seeman, et aL, Nature, 365:441, 1993). 

20 While the molecular nature of the factors that specify neuronal ceil fate have not been 
established, members of the transforming growth factor-P (TGF-P) (Lyons, et aL, Trends 
in Genetics, 7:408, 1991) or the hedgehog protein family (Smith, J.C., Cell, 26:193, 
1994) may possess the characteristics expected from such factors as they participate in 
specification of cell fate, mediate inductive interactions between tissues, and in many 

25 cases act at a distance of only a few cell diameters. 
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The present invention establishes that hh activities encoded by these genes play a crucial 
role in early patterning of the developing eye and in patterning of the brain. For the first 
time, the invention shows that internal cleavage of hedgehog protein product is critical 
for full function, and that the two novel products of this auto-proteolytic cleavage display 
5 distinguishable activities, thus demonstrating that hh signaling activity is a composite 
effect of two separate signaling proteins that derive from a common hh protein precursor. 
In so doing, the invention provides the means for specific patterning and proliferation 
of desired neuronal cell types for addressing disorders which arise from neuronal 
degeneration or abnormal function. 

10 SUMMARY OF THE INVENTION 

The present invention is based on the seminal discovery that hedgehog proteins undergo 
auto-proteolytic cleavage which results in two separate proteins having distinct 
functional and structural characteristics. The two polypeptides, referred to as the "N" and 
"C" fragments of hedgehog, or N-terminal and C-terminal fragments, respectively, are 
1 5 produced after specific cleavage at a G'CF site recognized by the autoproteolytic domain 
in the native protein. The "C" fragment functions as a cholesterol transferase during 
autoproteo lysis thus allowing cholesterol modification of the "N" fragment. 

Thus, in one embodiment, the invention provides a substantially pure polypeptide 
characterized by having an amino acid sequence derived from amino terminal amino 

20 acids of a hedgehog protein and having at its carboxy terminus, a GICF cleavage site 
specifically recognized by a proteolytic activity of the carboxy terminal fragment of the 
native hedgehog polypeptide. The invention also provides a substantially pure polypep- 
tide characterized by having an amino acid sequence of a hedgehog polypeptide or a 
fragment derived from amino terminal amino acids of a hedgehog polypeptide, wherein 

25 the polypeptide or fragment thereof comprises a sterol moiety. Fragments derived from 
a native hedgehog polypeptide are included and preferably include extracellular amino 
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acid residues, such as those derived from - the 'N fragment. In one 'embodiment of the 
invention, the sterol moiety is cholesterol. 

In another embodiment, the invention provides a substantially pure polypeptide 
characterized by having an amino acid sequence derived from carboxy terminal amino 
5 acids of a hedgehog protein and having at its amino terminus, a GICF cleavage site 
specifically recognized by a proteolytic activity of the carboxy terminal fragment of the 
native hedgehog polypeptide. 

The invention also provides a method for modulating proliferation or differentiation of 
neuronal cells, comprising contacting the cells with a hedgehog polypeptide. The native 
10 hedgehog polypeptide, the N, or the C fragment, or functional fragments derived 
therefrom/are most useful for the induction of proliferation or differentiation of neuronal 
cells substantially derived from floor plate neuronal cells. 

In yet another embodiment, the invention provides a method for identifying a compound 
which affects hedgehog activity comprising incubating the compound with hedgehog 

15 polypeptide, or with biologically active fragments thereof, or with a recombinant cell 
expressing hedgehog, under conditions sufficient to allow the components to interact; 
and determining the effect of the compound on hedgehog activity or expression. For 
example, cholesterol level (e.g., biosynthesis or transport) is measured as an inidicator 
of hedgehog activity. In one aspect of the invention, the method provides a means for 

20 affecting cholesterol biosynthesis or transport in a cell comprising contacting a cell with 
an effective amount of a compound that affects hedgehog, thereby affecting cholesterol 
* biosynthesis or transport. The effect may be inhibition or stimulation of cholesterol 
biosynthesis or transport. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 shows processing of the hh protein by immunoblots (A,C) with antibodies 
against amino (Abl) and carboxy-terminal (Ab2) epitopes. FIGURE IB and D are blots 
of samples immunoprecipitated with Abl (B, lanes 7-9), Ab2 (D, lanes 19-21), or pre- 
5 immune serum (B, lanes 10-12, and D, lanes 22-24). 

FIGURES 1 E and 1 F show a schematic illustration of the hedgehog cleavage mechanism. 



FIGURE 2 shows sequence similarity between hh proteins and serine proteases, hh 
protein sequences are aligned to residues 323 to 329 of the D. melanogaster protein and 
numbered as positions 1 to 7 (group A). The catalytic histidines of mammalian serine 
1 0 proteinases (group B) are aligned to the invariant histidine at position 7 in hh proteins. 

FIGURE 3 shows autoproteolysis of the hh protein. 3 A shows a coomasie blue stained 
polyacrylamide gel showing production and purification of His 6 -U and Hi§ -I^a 
proteins from £. coll Samples were molecular weight markers (lanes 1 and 2); lysates 
of £. coli cells carrying the His 6 -U expression construct without (lane 3) and with (lane 

15 4) induction by IPTG; purified His 6 -U protein (lane 5); lysates of £. coli cells that cany 
the His 6 -U H329A expression construct without (lane 6) and with (lane 7) induction by 
IPTG; purified His 6 -U H329A protein (lane 8). FIGURE 3(B) is an immunoblot detected 
with Ab2 showing transfected S2 cells induced to express hh (lane 1); His 6 -U and 
His^U^A proteins incubated in cleavage reaction buffer for 0 hours (lanes 2 and 5), for 

20 20 hours (lanes 3 and 6), and for 20 hours in the presence of 20 mM TAME (a serine 
protease inhibitor) (lanes 4 and 7). 



FIGURE 4 shows autqproteo lytic functions of Drosophila (4A-C) and zebrafish (D) hh 
proteins map to the carboxy terminal fragments by in vitro translations of wild-type: and 
mutant hh proteins. The locations of mutations and cleavage sites (arrows) in these 
25 proteins are illustrated schematically in 4E. 
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FIGURE 5 shows immunoblots showing heat shock/induced expression of wild type and 
H329A mutant hh proteins in Drosophilia embryos (A) and (B) are immunoblots 
developed using Abl and Ab2 antibodies, respectively. Lanes 1 and 6, induced 
untransfected S2 cells; lanes 2 and 7, transfected S2 cells induced to express hh\ lanes 
5 3 and 8, heat shocked wild-type embryos; lanes 4 and 9, heat shocked hs/j/i embryos; 
lanes 5 and 10, heat shocked hs/i/i H329A embryos. 

FIGURE 6 shows in situ hybridization showing the embryonic effects of ubiquitously 
expressed wild type and H329A hh proteins. FIGURE 6 shows the embryonic 
distribution of wingless (wg) RNA as revealed by in situ hybridization is shown in (A) 

10 wild-type (homozygous V w 1118 ), (B) hshlu and (C) hshh H329A embryos that were 
exposed to two 10 minute heat shocks separated by a 90-minute recovery period (33). 
Wild-type embryos showed little change in wg expression, whereas the wild-type protein 
and, to a lesser extent, the H329A protein each induced ectopic wg expression (Table 1). 
Panels (D), (E), and (F) show the dorsal surfaces of y x W u \ hshh, and hshh H329A 

15 larvae, respectively, at the level of the fourth abdominal segment. These larvae were 
shocked for 30 minutes as embryos and allowed to complete embryogenesis. Cuticle cell 
types (1 °, 2°, 3°, and 4°) are labeled as described (J. Heemskerk and S. DiNardo, Cell 
76, 449, 1 994). Note the expansion of 2 ° cell types (naked cuticle) at the expense of 3 ° 
and some 4° types in the hshh embryo (E) under conditions where the phenotype of hshh 

20 H329A embryos (F) is identical to that of control embryos (D). 

FIGURE 7 shows X-gal staining to show imaginal disc effects of ubiquitous wild type 
and H329 hh proteins. X-gal staining was used to follow expression of wg ( A-C) or dpp 
(D-O) in imaginal discs of late third-instar larvae that carry wg-lacZ or dpp-lacZ reporter 
genes. Leg (A-F), wing (G-I) and eye-antennal discs (J-L) from control larvae (A, D, G, 
25 J), larvae carrying the hshh transgene (B, E, H, K) and larvae carrying the hshh H329A 
transgene (C, F, I, L) are displayed. In all panels anterior is to the left 
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FIGURE 8 (A) and (B) are immunoblots of cell pellets (lane 1) or supernatants (lane 2) 
from transfected S2 cell cultures expressing HH protein, developed with Abl (A) and 
Ab2 (B). Samples in each lane were from the same volume of resuspended total culture. 
Whereas N remained mostly associated with the cell pellet (compare lanes 1 and 2 in A), 

5 C was nearly quantitatively released into the supernatant (compare lanes 1 and 2 in B). 
U displayed partitioning properties in between those of N and C (A and B). (C) 
demonstrates the heparin binding activity of various HH protein species generated by in 
vitro translations with microsomes (38). Samples were: total translation mix (lane 1); 
supernatant after incubation with heparin agarose or agarose (control) beads (lanes 2 and 

1 0 4); and material eluted from heparin agarose or agarose beads after washing (lanes 3 and 
5). F, U, Nss and N fragments are depleted from reactions incubated with heparin agarose 
but not agarose beads (compare lanes 2 and 4 to 1), and the same species subsequently 
can be eluted from the heparin agarose but not the agarose beads (compare lanes 3 and 
5 with lane 1). 

15 FIGURE 9 shows the differential localizations of N and C in embryos by in situ 
localization of the hh transcript. Fig. 9 (A) is shown in comparison to the distribution of 
N and C epitopes detected with Abl and Ab2 in panels (B) and (C), respectively. Note 
that the distribution of N and C epitopes span approximately one-third and one-half of 
each segmental unit respectively, while the transcript is limited to approximately 

20 one-quarter of each unit. In (D), the localization of C epitopes in embryos homozygous 
for the hh nt allele is detected with the use of Ab2. C epitopes in this mutant, which 
displays impaired auto-proteolytic activity (see text), are more restricted, and resemble 
the wild-type localization of N. Homozygous hh nE embryos were identified by loss of 
a marked-balancer from a heterozygous parent stock. All embryos are at mid to late stage 

25 9 (extended germ-band). 

FIGURE 10 shows a signal relay versus dual function models for hh protein action. In 
Fig. 10 (A), the long-range effects of hh signaling are achieved indirectly through short- 
range induction of a second signaling molecule (X). Based on its biochemical properties 
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and its restricted tissue localization, N is presumed to represent the active short-range 
signal while the role of C would be limited to supplying the catalytic machinery required 
for biogenesis of N. In (B), the long- and short-range signaling functions of hh are 
supplied by the N and C proteins derived by internal auto-proteolysis of the U precursor. 
5 N is implicated in short-range signaling by retention near its cellular site of synthesis, 
while C is less restricted in its distribution and would execute long-range signaling 
functions. In both models, auto-pro teolysis is required to generate fully active signaling 
proteins. 

FIGURES 10 C and D show an immunoblot of the N fragment synthesized from a wild 
10 type construct (C) or a consruct lacking the C domain (D). 

FIGURES 1 1 A and B show the nucleotide and deduced amino acid sequences for partial 
human hh clones. 

FIGURE 12 A and B show in vitro cleavage reactions of a Drosophila hh protein 
produced in E coli and purified to homogeneity. FIGURE 12, Panel A shows a time 

15 course of cleavage after initiation by addition of DTT. Panel B shows incubations of 
concentrations ranging over three order of magnitude for a fixed time period (four hours), 
with no difference in the extent of conversion to the cleaved form. Panel C shows the 
sequence around the cleavage site as determined by amino-terminal sequence of the 
cleaved fragment C. The cleavage site is denoted by the arrow, and the actual residues 

20 sequenced by Edman degradation of the C fragment are underlined. Panel C also shows 
an alignment of all published vertebrate hh sequences plus some of unpublished 
sequences from fish and Xenopus. The sequences shown correspond to the region of 
Drosophila hh where the cleavage occurs, and demonstrates the absolute conservation 
of the Gly-Cys-Phe sequence at the site-of cleavage. Panel D shows a SDS-PAGE gel 

25 loaded 'with in vitro transcription/translation reactions as described in the previous 
Examples, using various hh genes as templates, dhh is Drosophila, twhh and zishh are 
the twiggy-winkle and sonic hh genes of the zebrafish, and mshh is the shh/Hgh-l/vhh-l 
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gene of the mouse. Panel E shows that Edman degradation of the C fragments releases 
35 S counts on the first but not subsequent rounds for all these proteins, indicating that the 
site of autoproteolytic cleavage for all of these hh proteins is the amide bond to the 
amino-terminal side of the Cys residue that forms the center of the conserved Gly-Cys- 
5 Phe sequence highlighted in panel C. 

FIGURE 13 shows the predicted amino acid sequences are shown in single letter code. 
13(a) shows sequences common to five distinct M-like genes are shown with a dot 
indicating identity with the corresponding residue of zebrafish twiggy-winkle. 13(b) 
shows amino acid sequences of twhh and shh are aligned to those of the soniclvhh-l 
10 class from chick and mouse. The amino-terminal hydrophobic stretch common to all four 
hh genes is shaded. The asterisk (*) denotes invariant amino acid residues associated 
with the proteolytic domain of C fragment from various species. 13(c) shows percent 
identity of residues carboxy-terminal to the hydrophobic region. 

FIGURES 14A-S show a comparative expression of twhh, shh, and pax-2 during 
1 5 zebrafish embryogenesis. 

FIGURE 15, panels 15A-15I, show the effects of ectopic hh on zebrafish development. 
Wild type zebrafish, Danio rerio, Ekkwill Waterlife Resources) were maintained at 
28.5°C, some embryos were then cultured overnight at RT. Zebrafish embryos were 
injected at the 1-8 cell stage with twhh, shh, or /acZRNA and examined at 28 h of 
20 development, (a-c) Dorsal view of the midbrain-hindbrain region; anterior is left, (a) 
lacZ. (b) twhh. (c) shh. (d-f) Frontal optical section of the forebrain region; anterior is 
up. (d) lacZ. (h) twhh. (f) shh (g-l) Lateral view of the eye region; anterior is left, (g) 
lacZ. (h) twhh. (i) twhh. 

FIGURE 16 is a table showing the effects of ectopic expression of shh, twhh and twhh 
25 mutants on zebrafish embryonic development. 
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FIGURE 17 shows zebrafish twiggy-winkle hedgehog derivatives. FIGURE 17A shows 
cartoons of various twhh open reading frames. SS (shaded) is the predicted N-terminal 
signal sequence for secretion of these proteins and encompasses the first 27 amino acids 
of each open reading frame. The arrow indicates the predicted internal site of auto- 

5 proteolytic cleavage. Amino acid residue numbers are according to Figure 1 3b. The 
filled triangle denotes the normal termination codon for the twhh open reading frame. 
Construct U HA contains a mutation that blocks auto-proteolysis (the histidine at residue 
273 is changed to an alanine; see Lee, J.J., et ai } supra .). Construct U356 HA contains a 
stop codon in place of amino acid residue 357 as well as the H273A mutation in U HA . 

10 Construct N encodes just the first 200 amino acids of twhh. Construct C has had the 
codons for residues 3 1-197 deleted. 

FIGURE 17B shows in vitro translation of the expression constructs shown schematically 
in part a. Constructs were translated in vitro in the presence of 33 S methionine and 
analyzed by autoradiography after SDS-PAGE. 

15 FIGURES 18A and 18B show Northern blot analysis of the effect of hedgehog on 
expression of various neural markers. 

FIGURES 19A and 19B show hh synergy with naturally occurring neural markers or 
agents (e.g., XAG-K XANF-2, Otx-A, En-2, Krox-20, Xlh box-6, NCAM ? and EF-la). 

FIGURE 20 A shows hh constructs including delta N-C. 

20 FIGURE 20B shows a Northern blot analysis of the effect of hedgehog N or C on various 
neural markers. 

FIGURE 21 shows AN-C interferes with X-bhh and N-activity in animal cap explants as 
shown by RT-PCR analysis. 
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FIGURE 22 A is an illustration of lipid stimulation of hedgehog autoprocessing. 

FIGURE 22B shows a Coomassie blue stained SDS-PAGE of autocleavage reactions in 
bacterially expressed His 6 Hh-C protein. 

FIGURE 23A is a thin layer chromatography (TLC) plate coated with silica gel G 
5 (Merck) showing the fractionation of bulk S2 cell lipids using a heptane:ether:formic 
acid solvent (80:20:2). 

FIGURE 23 B is a Coomassie blue-stained SDS-polyacrylamide gel showing in vitro 
autocleavage reactions of the bacterially expressed His6Hh-C protein incubated with 1 
mMDTT plus either unfractionated S2 cell lipids (lane 1 ), or spots A through F (lanes 2- 
10 7, respectively). 

FIGURE 23C is TLC of S2 cell lipids (lane 1) along with selected lipid standards: 
phosphatidylcholine (lane 2), a diacylglycerol (lane 3), cholesterol (lane 4), stearic acid 
(lane 5), atriacylglycerol (lane 6), and cholesteryl ester (lane 7). Lipid spot B comigrates 
with cholesterol, as also demonstrated by mixing radio-labeled cholesterol with S2 lipids 
15 before TLC fractionation. 

FIGURE 23D is a Coomassie blue stained SDS-polyacrylamide gel showing that relative 
to 1 mMDTT alone (lane 1) cholesterol (0.35 raM) + 1 mMDTT (lane 2) stimulates 
His2Hh-C autocleavage in vitro. 

FIGURE 23E is an autoradiogram of electrophoretically-resolved products of His6Hh-C 
20 autocleavage reactions driven by 20 mMDTT (lane 1) or 1 mMDTT+0.35 mM 
cholesterol (lane 2). 
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FIGURE 24A shows Coomassie stained gels of His6Hh-C autocleavage reactions carried 
out in the presence of 20 mMDTT (lane 1), or 1 mMDTT+0.35 mM cholesterol (lane 2). 
Lane 3 contains a mixture of the samples loaded in lanes 1 and 2. 

FIGURE 24B is Coomassie stained gels showing protein products of His6Hh-C a- 
5 utocleavage reactions carried out in the presence of 1 mM DTT-H3.35 mM cholesterol 
(lanes 1 and 2) or with 20 mM DTT (lane 3). 

FIGURE 24C is an autoradiogram of immunoblotted Hh amino-terminal domains 
purified from cultured S2 cells. 

FIGURE 25A is an audioradiogram of a gel loaded with total cell proteins from S2 cells 
1 0 containing a stably integrated Cu-H-inducible hedgehog gene. 

FIGURE 25B is an HPLC profile of sterols separated on a CI 8 column by isocratic 
elution with a solvent containing methanol:ethanol:water (86:10:4). 

FIGURE 25C shows HPLC analysis as in (B) of the adduct released by base treatment 
of Hh-Np metabolically labeled with [3H]cholesterol (A). 

15 FIGURE 25D shows metabolic labeling of vertebrate Sonic hedgehog protein with 
[3H]cholesterol. Autoradiogram of a gel loaded with total cell proteins from COS-7 cells 
transfected with a wild-type Sonic hedgehog expression construct (Shh, lane 1 ) or a 
construct that generates an unprocessed amino-terminal protein truncated after the 
conserved glycine at the site of autocleavage (Shh-N, lane 2). 

20 FIGURE 26A is a schematic drawing of a two-step mechanism for Hh autoprocessing. 
Aided by deprotonation by either solvent or a base (Bl), the thiol group of Cys-258 
initiates a nucleophiiic attack on the carbonyl carbon of the preceding residue, Gly-257. 
This attack results in replacement of the peptide bond between Gly-257 and Cys-258 by 
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a thioester linkage (step 1). The emerging a-amino group of Cys-258 likely becomes 
protonated,.and an acid (A) is shown. donating a proton. .The -Thioester is subject to a 
second nucleophilic attack from the 3P-hydroxyl group of a cholesterol molecule, shown 
here facilitated by a second base (B2), resulting in a cholesterol-modified amino-terminal 
5 domain and a free carboxy-terminal domain. In vitro cleavage reactions may also be 
stimulated by addition of small nucleophiles including DTT, glutathione, and hydroxyl- 
amine. 

Figure 26B is a schematic drawing of a mechanism for intein self-splicing. A base {BY) 
or solvent deprotonates a cysteine or serine residue at the N-extein/intein junction (shown 

10 here as a cysteine residue) for attack on the carbonyl group of the preceding amino-acid 
residue resulting in the formation of a thioester/ester intermediate. An acid (A') may 
protonate the a-amino group of the cysteine/serine residue promoting its release. The 
thioester/ester is then subject to a second nucleophilic attack from a cysteine, serine, or 
threonine residue at the intein/C-extein junction (shown here as a cysteine residue). A 

15 second base (B2') is shown facilitating deprotonation of the second nucleophile, although 
this function may also be carried out by Bl\ This reaction produces a branched protein 
intermediate that ultimately resolves to.a free intein and ligated exteins. 

Figure 27 is a Coomassie Brilliant Blue-stained SDS-polyacrylamide gel showing in vitro 
autocleavage reactions of bacterially-expressed His 6 Hh-C 25 (lanes 1-3) and Hi%Hh-C, 7 

20 (lanes 4-6) proteins. Proteins were incubated with 1 mM DTT (lanes 1 and 4), 50 mM 
DTT (lanes 2 and 5) or 350 \xM cholesterol/1 mM DTT (lanes 3 and 6). The uncleaved 
His 6 Hh-C 25 protein migrates as a -29-kDa species, and the carboxy-terminal cleavage 
product of this protein migrates as a -25-kDa species (Porter et al., 1996). The 
uncleaved.His 6 Hh-C, 7 protein migrates as.a,r-2LrJcDa species, and the carboxy-terminal 

25 product of this truncated protein migrates as a ~ 1 4-kDa species, : The amino-terminal 
product of the His 6 Hh-C 25 and HisgHh-C,? proteins migrates as a ~7-kDa species when 
DTT-modified or as a ~5-kDa species when cholesterol-modified. His 6 Hh-C 17 was also 
incubated with 46 \iM [ 3 H] cholesterol/ 1 mM DTT, and no cholesterol-modified product 
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was detected by autoradiography. A cholesterol-transfer activity 1% of wildtype could 
have been detected by this radioassay. 

Figure 28A is a ribbon diagram of Hh-C, 7 . The amino- (N) and carboxy- (C) termini are 
labeled. This panel was prepared with MOLSCRIPT (Kraulis, 1991). 

5 Figure 28B is a topology diagram of Hh-C 17 . Residues in P strands are in boxes with 
amino-acid type and number indicated. Residues in turns of 3 10 helix are ovals with 
amino-acid type and number indicated. Other residues in the structure are in boxes with 
amino-acid number indicated. Hydrogen bonds between p strands are indicated with 
arrows. A pseudo two-fold axis of symmetry is indicated with a diamond. This panel 

10 was prepared using the output of the program PROMOTIF. 

FIGURE 29A is a pseudo two-fold symmetry in Hh-C !7 . A stereodiagram of a trace of 
the a-carbon backbone of residues 258-393 of Hh-C17 viewed along the pseudo-twofold 
symmetry axis is shown. Equivalent loops are colored identically. Residues 258-276 
and 324-347 are colored yellow, residues 276-301 and 347-373 are colored magenta, and 
15 residues 3 12-320 and 38 1-389 are colored cyan. The pseudo two-fold axis is indicated 
with a closed circle. 

FIGURE 29B is a stereodiagram of a backbone trace of Hh-C17 is shown with residues 
258-323 colored green and residues 324-395 colored yellow. The extended loops that 
make up the Hh-C 17 structure are labeled in the order in which they appear in the amino- 

20 acid sequence, A1-A2-A3-B1-B2-B3. Two structurally cohesive subdomains are 
apparent, one comprising loops Al, A2, and B3 and another comprising loops Bl, B2, 
and A3. Hh-C, 7 appears to have arisen from a tandem duplication of a primordial gene 
to produce the 'A' and *B' sequence regions coupled with exchange of the homologous 
A3 (residues 3 10-323) and B3 (residues 379-395) loops to form structural subdomains 

25 that are hybrids of : A' and 'B' sequences. A pivot about which exchange 9of these loops 
appears to have occurred is indicated by an arrow. 
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FIGURE 29C is a stereodiagram of backbone traces of the regions of Hh-C17 
corresponding to the sequence duplication (residues 259-320 colored green and residues 
325-389 colored yellow) following superposition is shown. The structures were aligned 
with the program QUANTA (Polygen). The r.m.s. deviation in a -carbon position for 50 
5 matched residues in the subdomains is 1.38 A. Conserved p turns (see below) are 
colored red. Panels A, B and C were prepared with MOLSCRIPT (Kraulis, 1991, supra). 

FIGURE 29D is a structure-based alignment of the amino-acid sequences of the two 
subdomains of Hh-C l7 . Conserved amino acids are highlighted with yellow. Active site 
residues are in red. P strands are indicated with arrows. P 1 b and P2b are slightly longer 

1 0 than p 1 a and P2a, respectively, and are indicated with lighter green coloring. Fractional 
solvent accessibility (FSA) is shown in blue for each residue in the Hh-C I7 structure. The 
FSA is the ratio of the solvent accessible surface area of residue X in a Gly-X-Gly 
tripeptide vs. in the Hh-C17 structure. A value of 0 represents a value from 0.00 to 0.09, 
I represents 0.10 to 0.19, and so on. Type I p turns are conserved at homologous 

15 positions in both Hh-C )7 subdomains at residues 260-263 (homologous to residues 326- 
329) and residues 317-320 (homologous to residues 386-389). A type II P turn is 
conserved between both subdomains at residues 279-282 (homologous to residues 350- 
353), and a type IV p turn is conserved between both subdomains at residues 288-291 
(homologous to residues 359-362). p bulges are found at homologous positions in both 

20 Hh-C I7 subdomains at residues 282 (homologous to residue 353) and 300 (homologous 
to 372). 

FIGURE 30A is a stereodiagram of the nucleophilic residue, Cys-258, and nearby 
residues. Distances (A) between atoms are indicated. 

FIGURE 30B is a ribbon diagram of Hh-C 17 with the side chains of Cys-258 and other 
25 putative active site residues indicated. Panels A and B were prepared with MOLSCRIPT 
(Kraulis, 1991,supia). 
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FIGURE 30C shows a Coomassie Brilliant Blue-stained SDS-polyacrylamide gel 
showing in vitro autocleavage reactions of bacterially-expressed His 6 HhC wildtype 
(lanes 1-3) and mutant proteins, H329A (lanes 4-6), T326A (lanes 7-9), and D303A 
(lanes 10-12). Proteins were incubated with 1 mM DTT (lanes 1, 4, 7, and 10), 50 mM 

5 DTT (lanes 2, 5, 8, and 1 1) or 350 \xM cholesterol/1 mM DTT (lanes 3, 6, 9, and 12). 
The uncleaved protein migrates as a -29 kDa species. The carboxy-terminal cleavage 
product migrates as a~25-kDa species and the amino-terminal product migrates as a -7- 
kDa species when DTT-modified or as a -5-kDa species when cholesterol-modified. 
The significant level of apparent cleavage seen with the D303 A protein with 1 mM DTT 

10 results from preexisting cleavage products in the preparation; however, addition of 50 
mM DTT greatly increases the amount of cleavage products and addition of cholesterol 
does not produce a cholesterol-modified product (~5-kDa species). D303A was also 
incubated with 46 jiM [ 3 H]cholesterol/l mM DTT, and no cholesterol-modified product 
was detected by autoradiography (data not shown). A cholesterol-transfer activity 1% 

15 of wildtype could have been detected by this radioassay. 

FIGURE 31 A is an alignment of the Hh-C J7 amino-acid sequence (residues 258-402) 
with other Hh sequences, with nematode sequences homologous to Hh-C, and intein 
sequences. The alignment was constructed by superimposing the Hh-C and intein 
alignments produced by the CLUSTALW program using the results of the PSI-BLAST 

20 analysis as a guide (Thompson et al., 1994). Additionally, the alignment was verified by 
analyzing a subset of the sequences containing fifteen diverse intein sequences and three 
Hh-C sequences with the MACAW program (Schuier et al., 1991, supra). In this 
analysis the alignment of the blocks containing the cysteine and histidine residues 
implicated in catalysis was significant with p^ 10' s , and the block including P2b of Hh-C 

25 with 

p< 10" 4 . The exact counterpart of P4a in the intein sequences remained uncertain; the 
respective region is replaced by the number of amino-acid residues. The position of the 
endonuclease domain (ENDO - domain II according to Duan et aL, 1997) inserted in the 
intein sequences is shown and the number of amino acid residues in these domains is 
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indicated. A second inserted domain in the PI-Scel/YEAST intein thought to be involved 
in DNA recognition (DRR) is located between pib and P2b. Three inteins, 
GYRA/MYCXE, DNAB/PORPU, and KLBA/METJA, contain a short insert replacing 
the endonuclease domain. The yeast HO endonuclease does not undergo self-splicing, 

5 but contains a vestigial, inactive intein domain. The KLBA/METJA intein homologue 
in which the amino-terminal nucleophile is replaced by alanine is likely inactive as well. 
A consensus sequence is shown above the aligned sequences and shows amino acid 
residues conserved in at least one half of the sequences in each of the two aligned sets. 
'IT indicates a bulky hydrophobic residue (I, L, V, M, F, Y, W), and "-" indicates a 

10 negatively-charged residue (Dor E). Catalytic site residues are highlighted with red; 
hydrophobic residues are highlighted with yellow; other residues that conform with the 
consensus are highlighted with blue. The secondary structure elements and for Hh-C !7 
are shown. Every tenth residue in the Hh-C l7 sequence is indicated with a dot. The 
leftmost column"5hows abbreviated protein imd species names, and the second column 

15 shows the gene identification number in the NCBI protein database. Protein name 
abbreviations: CE(R084B4.1), F46B3 (F46B3.C), M75 (ZK6.78.5), M89 (C29F3.d), ZK 
(ZK1290.5), ZK377 (ZK377.1), Ml 10 (T05C12.10) - uncharacterized nematode proteins 
containing Hh carboxy-terminal domain- homo logues; HH - hedgehog; EHH - Echidna 
hedgehog; CHH - Cephalic hedgehog; DHH - Desert hedgehog; IHH - Indian hedgehog; 

20 BHH - Banded hedgehog; TWHH - Tiggy-winkle hedgehog; XHH - Xenopus hedgehog; 
SHH - Sonic hedgehog; Pl-Scel, Pi-Ctrl - yeast intein endonucleases; GYRA. GYRB - 
DNA gyrase A and B subunits; RECA - recombinase; DNAB - replicative DNA heiicase; 
POLC - DNA polymerase III a subunit; CLPP - endopeptidase; IF-2 - translation 
initiation factor 2; HELI - putative heiicase; RFC - replication factor C; ORF - 

25 uncharacterized open reading frame product; G6PT - glucose-6-phosphate transaminase; 
RPO-A\ PRO-A" - DNA-dependent RNA ^-polymerase subunits; RGYR - reverse 
gyrase; PEPS - phosphoenolpyruvate synthase; UDGD - uridine diphosphate glucose 
dehydrogenase; RNR - ribonucleotide reductase; DPOL - DNA polymerase, B family; 
TFIIB - transcription factor IIB; KLBA - predicted ATPase; HO - homothallic 

30 endonuclease. Species abbreviations: CAEEL - Caenorhabditis elegans; DANRE - 
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Danio rerio; XENLA - Xenopus laevis; Cynpy - Cynops pyrrhogaster; DROHY - 
Drosophila hydei; DROME - Drosophila melanogaster; CANTR - Candida tropicalis; 
MYCLE - Mycobacterium leprae; MYCXE - Mycobacterium xenopi; MYCTU - 
Mycobacterium tuberculosis; PORPU - Porphyra purpurea; S YNSP - Syncchocystis sp; 
5 CHLEU - Chlamydomonas; METJA - Methanococcus jannaschii; PYRFU - Pyrococcus 
furiosus; PYRSP - Pyrococcus sp.; THELI - Thermococcus litoralis. Several Hh and 
intein sequences closely related to those included were omitted. 

FIGURE 3 IB is a stereo ribbon diagram of Hh-C 17 , showing where the endonuclease 
domain and additional DNA recognition region of Pl-Scel are inserted. The loop where 
10 the endonuclease domain is inserted is colored red and the loop where the additional 
DNA recognition region ("the arm of the self-splicing domain" is inserted is colored 
blue. The orientation of the Hh-C, 7 in this view is the same as the orientation of the PI- 
Scel intein in Figure 2 of Duan, et al., 1997, supra. This panel was prepared with 
MOLSCRIPT. 

15 FIGURE 32 is a schematic drawing illustrating the duplication and insertion events that 
appear to have occurred during the evolution of Hh proteins and inteins. The insertion 
of the intein into a host protein is not shown. The order of some of these events is 
speculative. For example, dimerization through loop swapping may have preceded the 
gene duplication that produced an Hh-C ]7 -like protein. Abbreviations: Hh-C - Hh 

20 carboxy-endonuclease domain, DRR - DNA recognition region. 

FIGURE 33 shows inhibition of cholesterol biosynthesis by the plant steroidal alkaloid, 
jervine. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides two novel polypeptides originally derived from a single 
precursor protein, both of which have distinct structural and functional characteristics. 
The proteins are derived from a hedgehog protein and can be naturally produced by auto- 

5 proteolytic cleavage of the full-length hedgehog protein. Based on evidence provided 
herein, which indicates that hedgehog precursor protein and the auto-proteolytic products 
of hedgehog precursor protein are expressed in the floorplate of the ventral midline of 
the neural tube and notochord, the invention now provides a method for the induction of 
proliferation or differentiation of neuronal cells associated with or in close proximity to 

1 0 the floorplate and notochord. The invention also provides cholesterol modified hedgehog 
polypeptides and function fragments thereof. 

In a first --embodiment, -the invention provides a substantially pure polypeptide 
characterized by having an amino acid sequence derived from amino terminal amino 
acids of a hedgehog protein and having at its carboxy terminus, a glycine-cysteine- 
1 5 phenylalanine (G 1 CF) cleavage site specifically recognized by a proteolytic activity of 
the carboxy terminal fragment of the native hedgehog polypeptide. This fragment is 
denoted the N-terminal fragment or polypeptide or "N", herein. For example, in the case 
of the Drosophila hedgehog, the N fragment includes amino acids 1-257 of hedgehog 
protein, wherein amino acids 85-257 have a molecular weight of about 19 kD by non- 
20 reducing SDS-PAGE (Amino acid residue numbers 1 -257 include non-structural features 
such as signal sequences.). The G I CF cleavage site in Drosophila hedgehog precursor 
protein occurs at amino acid residues 257-259. Those of skill in the art will be able to 
identify the GlCFxleavage site. in:other hedgehog genes, as the amino acid location will 
be similar and the site-will be specifically recognized by the autoproteolytic activity of 
25 the corresponding C fragment. 

The N-terminal polypeptide is also characterized by being cell-associated in cells 
expressing the polypeptide in vitro, and being specifically localized in vertebrate or 
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Drosophila cells or embryos, for example. In other words, this N-terminal fragment of 
hedgehog, remains close to the site of cellular synthesis. The association of N with the 
cell is a result of the processing event which involves lipophilic modification of the 
amino terminal domain. (See Figure IE and Example 19) This modification is initiated 
5 by the action of the carboxy terminal domain, generating a thioester intermediate; the 
carboxy-terminal domain thus does not act simply as a protease, although cleavage of a 
peptide bond does ultimately result from its action. Specifically, the lipid modification 
is a cholesterol moiety. In addition, the N fragment binds to heparin agarose in vitro. 

The N polypeptide of the invention is characterized by having an amino acid sequence 
10 derived from amino terminal amino acids of hedgehog protein, e.g., 1-257 in Drosoph- 
ila, wherein amino acids 1-257 have a molecular weight of about 19 kD by non-reducing 
SDS-PAGE. The N polypeptide includes smaller fragments which retain the functional 
characteristics of full length N, e.g., bind to heparin. The hedgehog protein from which 
N is derived includes, but is not limited to Drosophila, Xenopus, chicken, zebrafish, 
15 mouse, and human. Crystallographic analysis shows the structure of SHH-N includes 
the presence of a zinc ion. While not wanting to be bound by a particular theory, the 
presence of the zinc ion is suggestive of zinc hydrolase activity. Zinc hydrolases include 
proteases such as carboxypeptidase A and thermolysin, lipases such as phospholipase C, 
and other enzymes such as carbonic anhydrase. Alterations in the zinc hydrolase site of 
20 the amino terminal signaling domain may be useful for modulating the range of diffusion 
of a hedgehog protein or to alter the signaling characteristics of the amino terminal 
signaling domain. For example, a mutation in the zinc hydrolase site may result in a 
tethered protein where ordinarily the protein is secreted at a distance. The result would 
be induction of a cell type not typically induced. Alteration in the zinc site may result in 
25 a molecule capable of inducing motor neurons and not floor plate, and vice versa. 

The identification of a cell-surface, or extracellular matrix localization of N and its 
expression in notochord and floor plate-associated cells, provides a means for isolation 
or specific selection of cells expressing N, e.g., to isolate a notochord sample or to isolate 
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floor plate cells. In addition, antibodies directed to N are useful for histological analysis 
of tissues suspected of expressing N protein. 

The invention also provides a substantially pure polypeptide characterized by having an 
amino acid sequence derived from carboxy terminal amino acids of a hedgehog protein 

5 and having at its amino terminus a G1CF cleavage site specifically recognized by a 
proteolytic activity of the carboxy terminal fragment of the native hedgehog polypeptide. 
This fragment is denoted the C-terminal fragment or polypeptide or "C'\ herein. For 
example, in Drosophila this "C M polypeptide derives from the C-terminal domain of 
hedgehog precursor protein beginning at amino acid residue 258, wherein the full length 

10 C-terminal domain has a molecular weight of about 25 kD by non-reducing SDS-PAGE, 
a histidine residue at position 72, and has protease activity. The GlCF cleavage site 
specifically recognized by the proteolytic activity of the carboxy terminal fragment of 
the native hedgehog polypeptide is located at amino acid residues 257-259. As described 
above for the N fragment, now that the present invention has shown the precise cleavage 

15 recognition site for the autoproteolytic domain of hedgehog, those of skill in the art can 
readily discern the cleavage site in other hedgehog proteins thereby allowing the ready 
identification of any N or C polypeptide of any hedgehog precursor protein. 

The "C" polypeptide of the invention is derived from the C-terminus of a hedgehog 
precursor protein, beginning at the autoproteolytic cleavage site identified at the GCF 

20 amino acid sequence, which in Drosophila corresponds to amino acids 257-259. In 
Drosophila the histidine residue found invariably at amino acid residue 329 of the native 
hedgehog protein, and at amino acid residue 72 of the C polypeptide, is essential for 
auto-proteolytic cleavage between amino acids 257 and 258 (G and C). Corresponding 
C-polypeptides of the invention will likewise contain a similarly located histidine residue 

25 which can be readily identified, such as by comparison to the Drosophila C -polypeptide. 
Among various species, the proteolytic domain can be characterized by the amino acid 
sequence -XTXXHLXX-. 
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The C polypeptide of the invention, unlike N, does not significantly bind to heparin 
agarose. C is characterized by being released into the culture supernatant of cells 
expressing C polypeptide in vitro and by being localized diffusely in cells and embryos. 
Because C polypeptide diffuses freely, it would be detectable in various body fluids and 
5 tissues in a subject. Identification of C polypeptide expression near the midline of the 
neural tube, as described herein, provides a useful assay for neural tube closure in an 
embryo/fetus, for example. The presence of C polypeptide in amniotic .fluid would be 
diagnostic of a disorder in which the neural tube may be malformed. 

Altered levels of C polypeptide in cerebrospinal fluid may be indicative of neuro- 
10 degenerative disorders, for example. Because C polypeptide is released from the cell 
after synthesis and autoproteolysis of native hedgehog precursor polypeptide, tumors 
synthesizing and releasing high levels of C polypeptide would be detectable without prior 
knowledge of the exact location of the tumor. 

C fragment is effective in inducing genes of the pituitary and anterior brain as well. In 
15 particular, induction is increased by the addition of a member of the TGF-0 family of 
growth factors. For example, human activin in combination with C fragment may be 
effective in enhancing pituitary cell growth and activity or development. C fragment 
possesses cholesterol transferase activity thereby effecting precursor cleavage and 
transfer of a cholesterol moiety to N fragment, resulting in a biologically active N 
20 fragment. 

C fragment is effective in inducing posterior markers of the brain by inhibiting N. Such 
a fragment is exemplified in Example 1 8 as AN-C. Therefore in another embodiment, the 
invention includes a polypeptide deleting amino acid residues 28-194 of X-bhh. 
(Autoproteolysis gives a C domain of 198-409 as well as a seven amino acid peptide, 
25 representing aa 24-27 and 195-197). This polypeptide blocks the activity of X-bhh and 
N in explants and reduces dorsoanterior structures in embryos. Also included are 
polynucleotide sequences encoding AN-C. AN-C is useful for increasing expression of 
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posterior neural markers (e.g., En-2, Krox-20, Xlttbox-6) and decreasing expression of 
anterior neural markers (e.g., XANF-2, XAG-1, Otx-A) when desirable to do so to 
modulate neural patterning. 

The term "substantially pure" as used herein refers to hedgehog N or C polypeptide 
5 which is substantially free of other proteins, lipids, carbohydrates, nucleic acids or other 

materials with which it is naturally associated. One skilled in the art can purify hedgehog 
' N or C polypeptide using standard techniques for protein purification. The substantially 

pure polypeptide will yield a single major band on a non-reducing polyacrylamide gel. 

The purity of the hedgehog N or C polypeptide can also be determined by amino- 
10 terminal amino acid sequence analysis. 

The invention includes a functional N or C polypeptide, and functional fragments 
thereof. As used herein, the term "functional polypeptide" or "functional fragment" 
refers to a polypeptide which possesses a biological function or activity which is 
identified through a defined functional assay and which is associated with a particular 

15 biologic, morphologic, or phenotypic alteration in the cell. Functional fragments of the 
hedgehog N or C polypeptide include fragments of N or C polypeptide as long as the 
activity, e.g., proteolytic activity or cholesterol transferase activity of C polypeptide 
remains. Smaller peptides containing the biological activity of N or C polypeptide are 
therefore included in the invention. The biological function, for example, can vary from 

20 a polypeptide fragment as small as an epitope to which an antibody molecule can bind 
to a large polypeptide which is capable of participating in the characteristic induction or 
programming of phenotypic changes within a cell. A "functional polynucleotide" 
denotes a polynucleotide which encodes a functional polypeptide as described herein. 

Biologically active or functional fragments of hedgehog, as described herein, are 
25 included in the invention and can be identified as such by functional assays. For 
example, fragments of hedgehog are identified as inducing differentiation of neuronal 
cells; regulating differentiationof chondrocytes; able to complement a loss of function 
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mutation of hedgehog, for example in a transgenic Drosophila; binding to Patched (Ptc); 
or having cholesterol transferase activity (e.g., C fragment). Fragments of the invention 
may be from about 30 to 450 amino acids in length; from about 50 to 300 amino acids 
in length; from about 75 to 250 amino acids in length; or from about 100 to 200 amino 
5 acids in length, as long as a biological activity of hedgehog is retained therein. 

Minor modifications of the N or C polypeptide primary amino acid sequence may result 
in polypeptides which have substantially equivalent activity as compared to the N or C 
polypeptide described herein. Such modifications may be deliberate, as by site-directed 
mutagenesis, or may be spontaneous. All of the polypeptides produced by these 

10 modifications are included herein as long as the proteolytic activity of C polypeptide, 
for example, is present. Further, deletion of one or more amino acids can also result in 
a modification of the structure of the resultant molecule without significantly altering its 
activity. This can lead to the development of a smaller active molecule which would 
have broader utility. For example, it is possible to remove amino or carboxy terminal 

15 amino acids which may not be required for N or C polypeptide activity. 

The N or C polypeptide of the invention also includes conservative variations of the 
polypeptide sequence. The term "conservative variation" as used herein denotes the 
replacement of an amino acid residue by another biologically similar residue. Examples 
of conservative variations include the substitution of one hydrophobic residue such as 

20 isoleucine, valine, leucine or methionine for another, or the substitution of one polar 
residue for another, such as the substitution of arginine for lysine, glutamic for aspartic 
acids, or glutamine for asparagine, and the like. The term "conservative variation" also 
includes the use of a substituted amino acid in place of an unsubstituted parent amino 
acid provided that antibodies raised to the substituted polypeptide also immunoreact with 

25 the unsubstituted polypeptide. 

The N fragment of the invention includes both the active form of the polypeptide and the 
N fragment including the uncleaved signal sequence. For example, in Drosophila where 
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the signal sequence is internal (at about amino acids 60-80), the entire uncleaved N 
fragment beginningat.theJnitiating methionine is included in the invention. Those of 
skill in the art can readily ascertain the nature and location of the signal sequence by 
using, for example, the algorithm described in von Heijne, G., Nucl. Acids Res. 14:4683, 
5 (1986). 

Hedgehog polypeptides of the invention include polypeptides having at least about 50%- 
1 00% homology with the hedgehog polypeptides provided herein, for example 52%, 
64%, 68%, 70%, 75%, 80%, 85%, 90%, 95% and up to 100% homology. Preferably 
homologous polypeptides are derived from vertebrate species, most preferably 
10 mammalian species, such as humans. 

The invention also provides an isolated polynucleotide sequence encoding a polypeptide 
having the amino .acid sequence of N or C polypeptide of the invention. The term 
"isolated" as used herein includes polynucleotides substantially free of other nucleic 
acids, proteins, lipids, carbohydrates or other materials with which it is naturally 

15 associated. Polynucleotide sequences of the invention include DNA, cDNA and RNA 
sequences which encode .N or C polypeptide. It is understood that all polynucleotides 
encoding all ora.portion.of-N.or C .polypeptide are also included herein, as long as they 
encode a polypeptide with N or C polypeptide activity. Such polynucleotides include 
naturally occurring, synthetic, and intentionally manipulated polynucleotides. For 

20 example, N or C polypeptide polynucleotide may be subjected to site-directed 
mutagenesis. The polynucleotide sequence for N or C polypeptide also includes 
antisense sequences. The polynucleotides of the invention include sequences that are 
degenerate as a result of the genetic code. There are 20 natural amino acids, most of 
which are specified by more than one codon. Therefore, all degenerate nucleotide 

25 sequences are included in the invention as long as the amino acid sequence of N or C 
polypeptide polypeptide encoded by the nucleotide sequence is functionally unchanged. 
In addition, the invention also includes a polynucleotide consisting essentially of a 
polynucleotide sequence encoding a polypeptide having an amino acid sequence of N or 
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C and having at least one epitope for an antibody immunoreactive with N or C 
polypeptide. 

The polynucleotide encoding N or C polypeptide includes the entire polypeptide or 
fragments thereof, as well as nucleic acid sequences complementary to that sequence. 

5 A complementary sequence may include an antisense nucleotide. When the sequence is 
RNA, the deoxynucleotides A, G, C, and T are replaced by ribonucleotides A, G, C, and 
U, respectively. Also included in the invention are fragments of the above-described 
nucleic acid sequences that are at least 15 bases in length, which is sufficient to permit 
the fragment to selectively hybridize to DNA that encodes the protein under physiologi- 

10 cal conditions. 

Hedgehog encoding polynucleotides of the invention include nucleic acid sequences 
identified by hybridization to a hedgehog nucleic acid described herein. In nucleic acid 
hybridization reactions, the conditions used to achieve a particular level of stringency 
will vary, depending on the nature of the nucleic acids being hybridized. For example, 
15 the length, degree of complementarity, nucleotide sequence composition (e.g., GC v. AT 
content), and nucleic acid type (e.g., RNA v. DNA) of the hybridizing regions of the 
nucleic acids can be considered in selecting hybridization conditions. An additional 
consideration is whether one of the nucleic acids is immobilized, for example, on a filter. 

An example of progressively higher stringency conditions is as follows: 2 x SSC/0. 1% 
20 SDS at about room temperature (hybridization conditions); 0.2 x SSC/0. 1% SDS at about 
room temperature (low stringency conditions); 0.2 x SSC/0.1% SDS at about 42°C 
(moderate stringency conditions); and 0.1 x SSC at about 68 °C (high stringency 
conditions). Washing can be carried out using only one of these conditions, e.g., high 
stringency conditions, or each of the conditions can be used, e.g., for 10-15 minutes each, 
25 in the order listed above, repeating any or all. of the steps listed. However, as mentioned 
above, optimal conditions will vary, depending on the particular hybridization reaction 
involved, and can be determined empirically. 
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DNA sequences of the invention can be obtained by several methods. For example, the 
DNA can be isolated using hybridization techniques which are well known in the art. 
These include, but are not limited to: 1 ) hybridization of genomic or cDNA libraries with 
probes to detect homologous nucleotide sequences; 2) antibody screening of expression 
5 libraries to detect cloned DNA fragments with shared structural features; and 3) PCR 
amplification of a desired nucleotide sequence using oligonucleotide primers. 

Preferably the hedgehog, N, or C polynucleotide of the invention is derived from a 
vertebrate organism, and most preferably from human. Screening procedures which rely 
on nucleic acid hybridization make it possible to isolate any gene sequence from any 

10 organism, provided the appropriate probe is available. Oligonucleotide probes, which 
correspond to a part of the sequence encoding the protein in question, can be synthesized 
chemically. This requires that short, oligopeptide stretches of amino acid sequence must 
be known. The DNA sequence encoding the protein can be deduced from the genetic 
code, however, the degeneracy of the code must be taken into account. It is possible to 

15 perform a mixed addition reaction when the sequence is degenerate. This includes a 
heterogeneous mixture of denatured double-stranded DNA. For such screening, 
hybridization is preferably performed on either single-stranded DNA or denatured 
doublfc-stranded DNA. Hybridization is particularly useful in the detection of cDNA 
clones derived from sources where an extremely low amount of mRNA sequences 

20 relating to the polypeptide of interest are present. In other words, by using stringent 
hybridization conditions directed to avoid non-specific binding, it is possible, for 
example, to allow the autoradiographic visualization of a specific cDNA clone by the 
hybridization of the target DNA to that single probe in the mixture which is its complete 
complement (Wallace, et ai, Nucl Acid Res., 9:879, 1981). 

25 The development of specific DNA sequences encoding hedgehog can also be obtained 
by: 1) isolation of double-stranded DNA sequences from the genomic DNA; 2) chemical 
manufacture of a DNA sequence to provide the necessary codons for the polypeptide of 
interest; and 3) in vitro synthesis of a double-stranded DNA sequence by reverse 
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transcription of mRNA isolated from a eukaryotic donor cell. In the latter case, a double- 
stranded DNA complement of mRNA is eventually formed which is generally referred 
to as cDNA. 

Of the three above-noted methods for developing specific DNA sequences for use in 
5 recombinant procedures, the isolation of genomic DNA isolates is the least common. 
This is especially true when it is desirable to obtain the microbial, expression of 
mammalian polypeptides due to the presence of introns. 

The synthesis of DNA sequences is frequently the method of choice when the entire 
sequence of amino acid residues of the desired polypeptide product is known. When the 

10 entire sequence of amino acid residues of the desired polypeptide is not known, the direct 
synthesis of DNA sequences is not possible and the method of choice is the synthesis of 
cDNA sequences. Among the standard procedures for isolating cDNA sequences of 
interest is the formation of plasmid- or phage-carrying cDNA libraries which are derived 
from reverse transcription of mRNA which is abundant in donor cells that have a high 

15 level of genetic expression. When used in combination with polymerase chain reaction 
technology, even rare expression products can be cloned. In those cases where 
significant portions of the amino acid sequence of the polypeptide are known, the 
production of labeled single or double-stranded DNA or RNA probe sequences 
duplicating a sequence putatively present in the target cDNA may be employed in 

20 DNA/DNA hybridization procedures which are carried out on cloned copies of the 
cDNA which have been denatured into a single-stranded form (Jay, et al, NucL Acid 
Res., 11:2325, 1983). 

A preferred method for obtaining genomic DNA, for example, is Polymerase Chain 
Reaction (PCR), which relies on an in vitro method of nucleic acid synthesis by which 
25 a particular segment of DNA is specifically replicated. Two oligonucleotide primers that 
flank the DNA fragment to be amplified are utilized in repeated cycles of heat 
denaturation of the DNA, annealing of the primers to their complementary sequences, 
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and extension of the annealed primers with DNA polymerase. These primers hybridize 
to opposite strands of the target sequence and are oriented so that DNA synthesis by the 
polymerase proceeds across the region between the primers. Since the extension 
products themselves are also complementary to and capable of binding primers, 
5 successive cycles of amplification essentially double the amount of the target DNA 
synthesized in the previous cycle. The result is an exponential accumulation of the 
specific target fragment, approximately 2", where n is the number of cycles of 
amplification performed (see PCR Protocols, Eds. Innis, et ai, Academic Press, Inc., 
1990, incorporated herein by reference). 

10 A cDNA expression library, such as Agtl 1 . can be screened indirectly for hedgehog, N, 
or C polypeptides having at least one epitope, using antibodies specific for hedgehog, N, 
or C. Such antibodies can be either polyclonally or monoclonally derived and used to 
detect expression product indicative of the presence of the desired hedgehog cDNA. 

The polynucleotide sequence for hedgehog, N, or C, also includes sequences 
15 complementary to the polynucleotide encoding hedgehog, N or C (antisense sequences). 
Antisense nucleic acids are DNA or RNA molecules that are complementary to at least 
a portion of a specific mRNA molecule (Weintraub, Scientific American, 262:40, 1990). 
The invention embraces all antisense polynucleotides capable of inhibiting production 
of hedgehog, N, or C polypeptide. In the cell, the antisense nucleic acids hybridize to the 
20 corresponding mRNA, forming a double-stranded molecule. The antisense nucleic acids 
interfere with the translation of the mRNA since the cell will not translate a mRNA that 
is double-stranded. Antisense oligomers of about 15 nucleotides are preferred, since they 
are easily synthesized and are less likely to cause problems than larger molecules when 
introduced into the target hedgehog, N, or C-producing cell. The use of antisense 
25 methods to inhibit the translation of genes is well known in the art (Marcus-Sakura, Anal 
Biochem., 172:289, 1988). Inhibition of target nucleotide would be desirable, for 
example, in inhibiting cell-proliferative disorders, such as certain tumors, which are 
mediated by hedgehog, N or C. 
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In addition, ribozyme nucleotide sequences for hedgehog, N or C are included in the 
invention. Ribozymes are RNA molecules possessing the ability to specifically cleave 
other single-stranded RNA in a manner analogous to DNA restriction endonucleases. 
Through the modification of nucleotide sequences which encode these RNAs, it is 
5 possible to engineer molecules that recognize specific nucleotide sequences in an RNA 
molecule and cleave it (Cech ? J. Amer. Med ^^.,260:3030, 1 988). A major advantage 
of this approach is that, because they are sequence-specific, only mRNAs .with particular 
sequences are inactivated. 

There are two basic types of ribozymes namely, tetrahymena-xyyz (Hasselhoff Nature, 
10 334:585, 1988) and ,, hammerhead t, -type. Tetrahymena-typc ribozymes recognize 
sequences which are four bases in length, while "hammerhead" -type ribozymes recognize 
base sequences 11-18 bases in length. The longer the recognition sequence, the greater 
the likelihood that sequence will occur exclusively in the target mRNA species. Conse- 
quently, hammerhead-type ribozymes are preferable to tetrahymena-type ribozymes for 
1 5 inactivating a specific mRNA species and 1 8-based recognition sequences are preferable 
to shorter recognition sequences. 

DNA sequences encoding hedgehog, N or C can be expressed in vitro by DNA transfer 
into a suitable host cell. "Host cells" are cells in which a vector can be propagated and 
its DNA expressed. The term also includes any progeny of the subject host cell. It is 
20 understood that all progeny may not be identical to the parental cell since there may be 
mutations that occur during replication. However, such progeny are included when the 
term "host cell" is used. Methods of stable transfer, meaning that the foreign DNA is 
continuously maintained in the host, are known in the art. 

In the present invention, the hedgehog, N or C polynucleotide sequences may be inserted 
25 into a recombinant expression vector. The term "recombinant expression vector" refers 
to a plasmid, virus or other vehicle known in the art that has been manipulated by 
insertion or incorporation of the hedgehog, N or C genetic sequences. Such expression 
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vectors contain a promoter sequence which facilitates the efficient transcription of the 
inserted genetic sequence of the host. The expression vector typically contains an origin 
of replication, a promoter, as well as specific genes which allow phenotypic selection of 
the transformed cells. Vectors suitable for use in the present invention include, but are 

5 not limited to the T7-based expression vector for expression in bacteria (Rosenberg, et 
al, Gene, 56:125, 1987), the pMSXND expression vector for expression in mammalian 
cells (Lee and Nathans, J. Biol. Chem.. 263:3521. 1988) and baculo virus-derived vectors 
for expression in insect cells. The DNA segment can be present in the vector operably 
linked to regulatory elements, for example, a promoter (e.g., T7, metallothionein I, or 

10 polyhedrin promoters). 

Polynucleotide sequences encoding hedgehog, N or C can be expressed in either 
prokaryotes or eukaryotes, although post-translational modification of eukaryotically 
derived polypeptides, such as carboxylation, would occur in a eukaryotic host. Hosts can 
include microbial, yeast, insect and mammalian organisms. Methods of expressing DNA 
15 sequences having eukaryotic or viral sequences in prokaryotes are well known in the art. 
Biologically functional viral and plasmid DNA vectors capable of expression and 
replication in a host are known in the art. Such vectors are used to incorporate DNA 
sequences of the invention. 

Methods which are well known to those skilled in the art can be used to construct 
20 expression vectors containing the hedgehog, N or C coding sequence and appropriate 
transcriptional/translational control signals. These methods include in vitro recombinant 
DNA techniques, synthetic techniques, and in vivo recombination/genetic techniques. 
See, for example, the techniques described in Maniatis, et aL, 1989 Molecular Cloning 
A Laboratory Manual, Cold Spring Harbor Laboratory, N.Y. 

25 A variety of host-expression vector systems may be utilized to express the hedgehog, N 
or C coding sequence. These include but are not limited to microorganisms such as 
bacteria transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid 
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DNA expression vectors containing the hedgehog, N or C coding sequence; yeast 
transformed with recombinant yeast expression vectors containing the hedgehog, N or 
C coding sequence; plant cell systems infected with recombinant virus expression vectors 
(e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with 

5 recombinant plasmid expression vectors (e.g., Ti plasmid) containing the Hedgehog, N 
or C coding sequence; insect cell systems infected with recombinant virus expression 
vectors (e.g., baculovirus) containing the hedgehog, N or C coding sequence; or animal 
cell systems infected with recombinant virus expression vectors (e.g., retroviruses, 
adenovirus, vaccinia virus) containing the hedgehog, N or C coding sequence, or 

10 transformed animal cell systems engineered for stable expression. 

Depending on the host/vector system utilized, any of a number of suitable transcription 
and translation elements, including constitutive and inducible promoters, transcription 
enhancer elements, transcription terminators, etc., may be used in the expression vector 
(see e.g., Bitter, etal, 1987, Methods in Enzymo logy, 153:516-544). For example, when 

15 cloning in bacterial systems, inducible promoters such as pL of bacteriophage y, plac, 
ptrp, ptac (ptrp-lac hybrid promoter) and the like may be used. When cloning in 
mammalian cell systems, promoters derived from the genome of mammalian cells (e.g., 
metallothionein promoter) or from mammalian viruses (e.g., the retrovirus long terminal 
repeat; the adenovirus late promoter; the vaccinia virus 7.5K promoter) may be used. 

20 Promoters produced by recombinant DNA or synthetic techniques may also be used to 
provide for transcription of the inserted hedgehog, N or C coding sequence. 

In bacterial systems a number of expression vectors may be advantageously selected 
depending upon the use intended for the expressed. For example, when large quantities 
of hedgehog, N or C are to be produced, vectors which direct the expression of high 
25 levels of fusion protein products that are readily purified may be desirable. Those which 
are engineered to contain a cleavage site to aid in recovering are preferred. Such vectors 
include but are not limited to the K coli expression vector pUR278 (Ruther, et al, EkIBO 
J., 2:1791, 1983), in which the Hedgehog, N or C coding sequence may be ligated into 
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the vector in frame with the lac Z coding region so that a hybrid -lac Z protein is 
produced; pIN vectors (Inouye and Inouye, Nucleic Acids Res., 13:3101, 1985; Van 
Heeke and Schuster, J. Biol. Chem. 264:5503, 1989) and the like. 

In yeast, a number of vectors containing constitutive or inducible promoters may be used. 

5 For a review see, Current Protocols in Molecular Biology, Vol. 2, 1988, Ed. Ausubel, 
et al, Greene Publish. Assoc. & Wiley Interscience, Ch. 13; Grant, et al, 1987, 
Expression and Secretion Vectors for Yeast, in Methods in Enzymology, Eds. Wu and 
Grossman, 31987, Acad. Press, N.Y., Vol. 153, pp.516-544; Glover, 1986, DNA 
Cloning, Vol. II, IRL Press, Wash., D.C., Ch. 3; and Bitter, 1987, Heterologous Gene 

10 Expression in Yeast, Methods in Enzymology, Eds. Berger and Kimmel, Acad. Press, 
N.Y., Vol. 152, pp. 673-684; and The Molecular Biology of the Yeast Saccharomyces, 
1982, Eds. Strathern, et ai, Cold Spring Harbor Press, Vols. I and II. A constitutive 
yeast promoter such as ADH or LEU2 or an inducible promoter such as GAL may be 
used (Cloning in Yeast, Ch. 3, R. Rothstein In: DNA Cloning Vol.11, A Practical 

15 Approach, Ed. DM Glover, 1986, IRL Press, Wash., D.C.). Alternatively, vectors may 
be used which promote integration of foreign DNA sequences into the yeast chromo- 
some. 

In cases where plant expression vectors are used, the expression of the hedgehog, N or 
C coding sequence may be driven by any of a number of promoters. For example, viral 
20 promoters such as the 35S RNA and 19S RNA promoters of CaMV (Brisson,.*/ al, 
Nature, 310:5 11,1 984), or the coat protein promoter to TMV (Takamatsu, et a/., EMBO 
J., 6:307, 1987) may be used; alternatively, plant promoters such as the small subunit of 
RUBISCO (Coruzzi, et aL 9 EMBO J, 3:1671-1680, 1984; Broglie, et ai, Science, 
224:838, 1984); or heat shock promoters, e.g., soybean hspl7.5-E or hspl7.3-B (Gurley, 
25 et ai, Mol. Cell Biol, 6:559, 1986) may be used. These constructs can be introduced 
into plant cells using Ti plasmids, Ri plasmids, plant virus vectors, direct DNA 
transformation, microinjection, electroporation, etc. For reviews of such techniques see, 
for example, Weissbach and Weissbach, 1988, Methods for Plant Molecular Biology, 
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Academic Press, NY, Section VIII, pp. 421-463; and Grierson and Corey, 1988, Plant 
Molecular Biology, 2d Ed., Blackie, London, Ch. 7-9. 

An alternative expression system which could be used to express is an insect system. 
In one such system. Autographa californica nuclear polyhedrosis virus (AcNPV) is used 

5 as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. 
The hedgehog, N or C coding sequence may be cloned into non-essential regions (for 
example the polyhedrin gene) of the virus and placed under control of an AcNPV 
promoter (for example the polyhedrin promoter). Successful insertion of the hedgehog, 
N or C coding sequence will result in inactivation of the polyhedrin gene and production 

10 of non-occluded recombinant virus (i.e., virus lacking the proteinaceous coat coded for 
by the polyhedrin gene). These recombinant viruses are then used to infect Spodoptera 
fnigiperda cells in which the inserted gene is expressed, (e.g., see Smith, el aL J. Viol., 
46:584, 1983; Smith, U.S. Patent No. 4,215,051). 

Eukaryotic systems, and preferably mammalian expression systems, allow for proper 
15 post-translational modifications of expressed mammalian proteins to occur. Eukaryotic 
cells which possess the cellular machinery for proper processing of the primary 
transcript, glycosylation, phosphorylation, and advantageously, secretion of the gene 
product may be used as host cells for the expression of hedgehog, N or C. Mammalian 
cell lines may be preferable. Such host cell lines may include but are not limited to 
20 CHO, VERO, BHK, HeLa, COS, MDCK, -293, and WI38. 

Mammalian cell systems which utilize recombinant viruses or viral elements to direct 
expression may be engineered. For example, when using adenovirus expression vectors, 
the hedgehog, N or C coding sequence may be ligated to an adenovirus transcription/- 
translation control complex, e.g., the late promoter and tripartite leader sequence. This 
25 chimeric gene may then be inserted in the adenovirus genome by in vitro or in vivo 
recombination. Insertion in a non-essential region of the viral genome (e.g., region El 
or E3) will result in a recombinant virus that is viable and capable of expressing the 
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protein in infected hosts (e.g., see Logan and Shenk, Proc. Natl. Acad Sci. USA, 
81:3655, 1984). Alternatively, the vaccinia virus 7.5K promoter may be used, (e.g., see, 
Mackett, et al, Proc. Nail Acad. Sci. USA, 79:7415, 1982; Mackett, et al, J. Virol, 49: 
857, 1984; Panicali, et al., Proc. Natl. Acad Sci USA, 79:4927, 1982). Of particular 

5 interest are vectors based on bovine papilloma virus which have the ability to replicate 
as extrachromosomal elements (Sarver, et al, Mol Cell Biol, 1:486, 1981). Shortly 
after entry of this DNA into mouse cells, the plasmid replicates to about 100 to 200 
copies per cell. Transcription of the inserted cDNA does not require integration of the 
plasmid into the host's chromosome, thereby yielding a high level of expression. These 

1 0 vectors can be used for stable expression by including a selectable marker in the plasmid, 
such as, for example, the neo gene. Alternatively, the retroviral genome can be modified 
for use as a vector capable of introducing and directing the expression of the hedgehog, 
N or C gene in host cells (Cone and Mulligan, Proc. Natl Acad. ScL USA, 81:6349, 
1984). High level expression may also be achieved using inducible promoters, including, 

1 5 but not limited to, the metallothionine IIA promoter and heat shock promoters. 

For long-term, high-yield production of recombinant proteins, stable expression is 
preferred. Rather than using expression vectors which contain viral origins of 
replication, host cells can be transformed with the hedgehog, N or C cDNA controlled 
by appropriate expression control elements (e.g., promoter, enhancer, sequences, 

20 transcription terminators, polyadenylation sites, etc.), and a selectable marker. The 
selectable marker in the recombinant plasmid confers resistance to the selection and 
allows cells to stably integrate the plasmid into their chromosomes and grow to form foci 
which in turn can be cloned and expanded into cell lines. For example, following the 
introduction of foreign DNA, engineered cells may be allowed to grow for 1-2 days in 

25 an enriched media, and then are switched to a selective media. A number of selection 
systems may be used, including but not limited to the herpes simplex virus thymidine 
kinase (Wigler, et al, Cell jT: 223, 1977), hypoxanthine-guanine 
phosphoribosyltransferase (Szybalska and Szybalski, Proc. Natl Acad. Sci. USA, 
48:2026, 1962), and adenine phosphoribosyltransferase (Lowy, et al, Cell, 22: 817, 
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1980) genes can be employed in tk\ hgprt* or aprt" cells respectively. Also, antimetabo- 
lite resistance can be used as the basis of selection for dhfr, which confers resistance to 
methotrexate (Wigler, et aL, Natl. Acad ScL USA, 77: 3567, 1 980; O'Hare, et aL, Proc. 
Natl Acad. ScL USA, 78: 1527, 1981); gpt 5 which confers resistance to mycophenolic 

5 acid (Mulligan and Berg, Proc. Natl. Acad ScL USA, 78: 2072, 1 98 1 ; neo, which confers 
resistance to the aminoglycoside G-418 (Colberre-Garapin, et aL, J. Mol. Biol., 150:1, 

1981) ; and hygro, which confers resistance to hygromycin (Santerre, et aL, Gene, 30: 147, 
1984) genes. Recently, additional selectable genes have been described, namely trpB, 
which allows cells to utilize indole in place of tryptophan; hisD, which allows cells to 

10 utilize histinol in place of histidine (Hartman and Mulligan, Proc, Natl Acad, ScL USA, 
85:8047, 1988); and ODC (ornithine decarboxylase) which confers resistance to the 
ornithine decarboxylase inhibitor, 2-(difluoromethyl)-DL-omithinc, DFMO 
(McConlogue L., 1987, In: Current Communications in Molecular Biology, Cold Spring 
Harbor Laboratory ed.). 

1 5 Transformation of a host cell with recombinant DNA may be carried out by conventional 
techniques as are well known to those skilled in the art. Where the host is prokaryotic, 
such as E. coli, competent cells which are capable of DNA uptake can be prepared from 
cells harvested after exponential growth phase and subsequently treated by the CaCl 2 
method using procedures well known in the art. Alternatively, MgCU or RbCl can be 

20 used. Transformation can also be performed after forming a protoplast of the host cell 
if desired. 

When the host is a eukaiyote, such methods of transfection of DNA as calcium phosphate 
co-precipitates, conventional mechanical procedures such as microinjection, electro- 
poration, insertion of a plasmid encased in liposomes, or virus vectors may be used. 
25 Eukaryotic cells can also be cotransformed with DNA sequences encoding the hedgehog, 
N or C of the invention, and a second foreign DNA molecule encoding a selectable 
phenotype, such as the herpes simplex thymidine kinase gene. Another method is to use 
a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to 



WO 98/30576 



PCT/US97/15753 



- 37 - 

transiently infect or transform eukaryotic cells and express the protein, (see for example, 
Eukaryotic Viral Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1 982). 

Isolation and purification of microbial expressed polypeptide, or fragments thereof, 
provided by the invention, may be carried out by conventional means including 
5 preparative chromatography and immunological separations involving monoclonal or 
polyclonal antibodies. 

The invention includes antibodies immunoreactive with or which bind to hedgehog, N 
or C polypeptide or functional fragments thereof Antibody which consists essentially 
of pooled monoclonal antibodies with different epitopic specificities, as well as distinct 

10 monoclonal antibody preparations are provided. Monoclonal antibodies are made from 
antigen containing fragments of the protein by methods well known to those skilled in 
the art (Kohler, ei aL, Nature, 256:495, 1975). The term antibody as used in this 
invention is meant to include intact molecules as well as fragments thereof, such as Fab 
and F(ab') 2 , which are capable of binding an epitopic determinant on hedgehog, N or C. 

15 The antibodies of the invention include antibodies which bind to the N or C polypeptide 
and which bind with immunoreactive fragments N or C. 

The term "antibody" as used in this invention includes intact molecules as well as 
fragments thereof, such as Fab, F(ab T ) 2 , and Fv which are capable of binding the epitopic 
determinant. These antibody fragments retain some ability to selectively bind with its 
20 antigen or receptor and are defined as follows: 

(1 ) Fab, the fragment which contains a monovalent antigen-binding fragment of 

an antibody molecule can be produced by digestion of whole antibody with 
the enzyme papain to yield an intact light chain and a portion of one heavy 
chain; 
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(2) 



Fab', the fragment of an antibody molecule can be obtained by treating whole 
antibody with pepsin, followed by reduction, to yield an intact light chain and 
a portion of the heavy chain; two Fab' fragments are obtained per antibody 
molecule; 



5 



(3) 



(Fab') 2 , the fragment of the antibody that can be obtained by treating whole 
antibody with the enzyme pepsin without subsequent reduction; F(ab') 2 is a 
dimer of two Fab' fragments held together by two disulfide bonds; 



(4) 



Fv, defined as a genetically engineered fragment containing the variable 
genetically fused single chain molecule. 



1 0 Methods of making these fragments are known in the art. (See for example, Harlow and 
Lane ? Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York 
(1988), incorporated herein by reference). 

As used in this invention, the term "epitope" means any antigenic determinant on an 
antigen to which the paratope of an antibody binds. Epitopic determinants usually 
1 5 consist of chemically active surface groupings of molecules such as amino acids or sugar 
side chains and usually have specific three dimensional structural characteristics, as well 
as specific charge characteristics. 

Antibodies which bind to the hedgehog, N or C polypeptide of the invention can be 
prepared using an intact polypeptide or fragments containing small peptides of interest 

20 as the immunizing antigen. The polypeptide such as N or C, or fragments thereof used 
to immunize an animal can be derived from translated cDNA or chemical synthesis 
which can be conjugated to a carrier protein, if desired. Such commonly used carriers 
which are chemically coupled to the peptide include keyhole limpet hemocyanin (KLH), 
thyroglobuinu bovine serum albumin (BSA), and tetanus toxoid. The coupled peptide 

25 is then used to immunize the animal (e.g., a mouse, a rat, or a rabbit). 
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If desired, polyclonal or monoclonal antibodies can be further purified, for example, by 
binding to and elution from a matrix to which the polypeptide or a peptide to which the 
antibodies were raised is bound. Those of skill in the art will know of various techniques 
common in the immunology arts for purification and/or concentration of polyclonal 

5 antibodies, as well as monoclonal antibodies (See for example, Coligan, et al, Unit 9, 
Current Protocols in Immunology, Wiley Interscience, 1991 , incorporated by reference). 
It is also possible to use the anti-idiotype technology to produce monoclonal antibodies 
which mimic an epitope. For example, an anti-idiotypic monoclonal antibody made to 
a first monoclonal antibody will have a binding domain in the hypervariable region 

10 which is the "image" of the epitope bound by the first monoclonal antibody. 

Antibodies as described herein as having specificity for N polypeptide, e.g., Abl 
(residues 83-160), are useful for specific identification of cells or tissues expressing the 
N fragment of hedgehog. Similarly, antibodies described herein as having specificity for 
C polypeptide, e.g, Ab2 (residues 300-391), are useful for specific identification of cells 
15 or tissues expressing the C fragment of hedgehog. Both antibodies, naturally, will also 
detect native hedgehog polypeptide. 

The N and C-specific antibodies of the invention are useful for purification of N and C 
polypeptide, respectively, especially using the antibodies immobilized on solid phase. 
By contacting a sample with anti-N antibody, both N and native hedgehog polypeptides 
20 can be isolated. By next contacting the sample removed by anti-N antibodies, with anti- 
C antibodies, the native hedgehog polypeptide is removed, thus allowing purification of 
N polypeptide. In a similar manner, C polypeptide can be antibody purified from a 
sample. 

Monoclonal antibodies of the invention are suited for use, for example, in immunoassays 
25 in which they can be utilized in liquid phase or bound to a solid phase carrier. In 
addition, the monoclonal antibodies in these immunoassays can be detectably labeled in 
various ways. Examples of types of immunoassays which can utilize monoclonal 
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antibodies of the invention are competitive and non-competitive immunoassays in either 
a direct or indirect format. Examples of such immunoassays are the radioimmunoassay 
(RIA) and the sandwich (immunometric) assay. Detection of the antigens using the 
monoclonal antibodies of the invention can be done utilizing immunoassays which are 
5 run in either the forward, reverse, or simultaneous modes, including immunohistochemi- 
cal assays on physiological samples. Those of skill in the art will know, or can readily 
discern, other immunoassay formats without undue experimentation. 

The term "immunometric assay" or "sandwich immunoassay", includes simultaneous 
sandwich, forward sandwich and reverse sandwich immunoassays. These terms are well 
10 understood by those skilled in the art. Those of skill will also appreciate that antibodies 
according to the present invention will be useful in other variations and forms of assays 
which are presently known or which may be developed in the future. These are intended 
to be included within the scope of the present invention. 

Monoclonal antibodies can be bound to many different carriers and used to detect the 
15 presence of N or C polypeptide. Examples of well-known carriers include glass, 
polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified 
celluloses, polyacrylamides, agaroses and magnetite. The nature of the carrier can be 
either soluble or insoluble for purposes of the invention. Those skilled in the art will 
know of other suitable carriers for binding monoclonal antibodies, or will be able to 
20 ascertain such using routine experimentation. 

For purposes of the invention, N or C polypeptide may be detected by the monoclonal 
antibodies when present in biological fluids and tissues. Any sample containing a 
detectable amount of N or C can be used. A sample can be a liquid such as urine, saliva, 
cerebrospinal fluid, blood, serum and the like, or a solid or semi-solid such as tissues, 
25 feces, and the like, or, alternatively, a solid tissue such as those commonly used in 
histological diagnosis. C polypeptide in particular is detectable in biological samples, 
since it tends to diffuse more readily thanN polypeptide. 



WO 98/30576 



PCIYUS97/15753 



- 41 - 

In performing the assays it may be desirable to include certain "blockers" in the 
incubation medium (usually added with the labeled soluble antibody). The "blockers" 
are added to assure that non-specific proteins, proteases, or anti-heterophilic immuno- 
globulins to anti-C or N immunoglobulins present in the experimental sample do not 
5 cross-link or destroy the antibodies on the solid phase support, or the radiolabeled 
indicator antibody, to yield false positive or false negative results. The selection of 
"blockers" therefore may add substantially to the specificity of the assays described in 
the present invention. 

The invention also provides a method for modulating proliferation or differentiation of 
10 neuronal cells comprising contacting the cells with a hedgehog polypeptide. The 
hedgehog polypeptide may be a native hedgehog polypeptide, or a N or C polypeptide, 
or functional fragments thereof. Preferably, the modulation is induction of proliferation 
or differentiation of a particular cell type. This can involve either synergistic positive 
induction of neuronal cells by N, or negative modulation by delta N-C for example (Lai, 
1 5 et al, Development 121:2349, 1995). Delta N-C enhances expression of posterier relative 
to anterior neural genes and does so through inhibition of N (see EXAMPLE 18 and 
Figure 18D). In addition to hedgehog polypeptide, a TGF-P factor may also be utilized 
in the method of the invention. 

Previous studies with the rat hedgehog gene showed that co-culture of cells expressing 
20 rat hedgehog precursor gene, with explant from neural tube, was sufficient to induce 
formation of motor neurons and floor plate from the explant (Jessesl, T., and Dodd, J., 
In Cell-Cell Signaling in Vertebrate Development (ed. E.J. Robertson, et al., pp 139-155, 
San Diego, Ca.). 1993). Therefore, based on the Examples herein showing that hedgehog 
is expressed near the floorplate of the ventral midline of the neural tube and notochord, 
25 neuronal cells substantially derived from floor plate neuronal cells can be induced by 
contacting the cells with hedgehog, N or C polypeptide. As used herein, the term 
"substantially derived", refers to those cells from the floor plate or proximate to the floor 
plate. For example, such cells include motor neurons and dopaminergic neurons. Those 
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of skill in the art will be able to identify other neuronal cells substantially derived from 
the floor plate. Preferably the cells are vertebrate cells and most preferably, human 
cells. 

In addition, as described herein in the Examples, hedgehog, and particularly C fragment, 
5 induces the expression of pituitary genes. Hedgehog is also effective in inducing anterior 
brain gene expression as exemplified by the OTX-A marker. Further, the addition of a 
TGF-p family member, for example activin, may be used to further induce expression 
of such genes. Other TGF-p family members will be known to those of skill in the art. 
This apparent synergy of hh fragments with TGF-P family members occurs through the 
10 TGF-P protein inducing expression of neural inducers such as noggin and follistatin. The 
hh fragment then synergizes with these inducers to pattern neural gene expression. 

hh fragments may also be useful as nerve-sparing agents or in restoring or promoting 
appropriate patterning during the healing of major limb trauma. In addition, the N and 
C fragments may be useful in the area of genetic counseling. Specifically, familial 
15 midline defects such as cyclopia, Polydactyly or neural tube defects may be diagnosed 
by mapping close to hh. Since autoproteolytic defects may be responsible for the 
disorders, N or C therapy could be provided. 

The invention also provides an autoproteolytic fusion protein comprising a first 
polypeptide including the proteolytic domain of the C polypeptide of the invention, a 

20 cleavage site recognized by the first polypeptide, and a second polypeptide. (It is 
understood that the first and second polypeptides can be reversed.) The auto-proteolytic 
activity of the native hedgehog protein is found entirely within the C polypeptide, 
therefore, the C polypeptide is useful for producing a fusion polypeptide which can then 
be cleaved at the junction of the C polypeptide and the second polypeptide. The fusion 

25 protein may optionally have a purification tag, such as a poly-histidine tag for isolation 
on a nickel column, or an antibody epitope tag, preferably on the C fragment. The 
cleavage site includes the sequence "GCF", which is recognized by the proteolytic 
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domain of the C polypeptide and is utilized to cleave the second polypeptide from the C 
fragment. Also included in the invention is a polynucleotide encoding the fusion protein 
of the invention. 

The invention also provides a method for producing an autoproteolytic fusion protein 
5 comprising operably linking a first polynucleotide, wherein the first polynucleotide 
encodes a first polypeptide including the proteolytic domain of the C polypeptide of the 
invention and the cleavage site recognized by the proteolytic domain, and a second 
polynucleotide encoding a second polypeptide. As described above, the fusion protein 
may also include a carrier peptide and/or a purification tag. 

10 The C polypeptide or functional fragment thereof is useful as a fusion partner to cause 
lipophilic modification and tethering of other proteins in vivo or in vitro. Such fusion 
proteins may be desirable for factors whose activity is required in a localized manner, 
either by targeting DNA constructs to specific cells or by introducing cells transfected 
with specific DNA constructs, for example. It may be desirable to lipid-modify a 

15 normally secreted protein in order to produce a cell-associated protein. For example, it 
may be desirable to produce a viral antigen that remains cell associated. Specifically, 
cholesterol is covalently attached to the N-terminal protein during autoprocessing and 
the C polypeptide acts as an intramolecular cholesterol transferase. 

Alternatively, the C polypeptide or functional fragments thereof can be used as a fusion 
20 partner with a protein of interest (e.g.. Protein X fused to hh-C domain). Such fusions 
form thioesters at the junction between Protein X and hh-C (via an S to N shift). The 
thioesters are then available as substrates for a peptide ligation reaction in which any 
peptide or protein having an amino terminal cysteine (Peptide Y) is added and undergoes 
spontaneous rearrangement (S to N shift) that generates a stable peptide bond between 
25 Protein X and Peptide Y (Protein X-peptide bond-Peptide Y). For example, a protein 
that is toxic when produced in vivo could be produced in vitro using the hh-C domain 
fusion protein method. 



WO 98/30576 



PCT/US97/15753 



- 44 - 

The fusion polypeptide may also include an optional carrier peptide. The "carrier 
peptide", or signal sequence, is located at the amino terminal end of the fusion peptide 
sequence. In the case of eukaryotes, the carrier peptide is believed to function to 
transport the fusion polypeptide across the endoplasmic reticulum. The secretory protein 

5 is then transported through the Golgi apparatus, into secretory vesicles and into the 
extracellular space or, preferably, the external environment. Carrier peptides which can 
be utilized according to the invention include pre-pro peptides which contain a 
proteolytic enzyme recognition site. Acceptable carrier peptides include the amino 
terminal pro-region of calcitonin or other hormones, which undergo cleavage at the 

10 flanking dibasic sites. However, it should be noted that the invention is not limited to 
the use of any particular peptide as a carrier. Other carrier peptides are known to those 
skilled in the art or can be readily ascertained without undue experimentation. 

In one embodiment of the invention, a carrier peptide which is a signal sequence is 
included in the expression vector, specifically located adjacent to the N-terminal end of 

15 the fusion polypeptide. This signal sequence allows the fusion protein to be directed 
toward the endoplasmic reticulum. Typically, the signal sequence consists of a leader 
of from about 16 to about 29 amino acids, starting with two or three polar residues and 
continuing with a high content of hydrophobic amino acids; there is otherwise no 
detectable conservation of sequence known. Such signal sequences are known to those 

20 of skill in the art, and include the naturally occurring signal sequence derived from a 
hedgehog protein. 

The fusion polypeptide of the invention includes a polypeptide encoded by a structural 
gene, preferably at the amino-terminus of the fusion polypeptide. Any structural gene 
is expressed in conjunction with the C-polypeptide (polynucleotide) and optionally a 
25 carrier peptide. The structural gene is operably linked with the carrier in an expression 
vector so that the fusion polypeptide is expressed as a single unit. 
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The identification of the autoproteolysis of hedgehog into the N and C domains is useful 
in a screening method to identify compounds or compositions which affect this 
processing activity. Thus, in another embodiment, the invention provides a method for 
identifying a composition which affects hh processing, which can be determined by 

5 activity or gene expression, comprising incubating the components, which include the 
composition to be tested (e.g., a drug, a small molecule, a protein) and a hh polypeptide 
or a recombinant cell expressing hedgehog or a gene encoding a C domain or functional 
fragment thereof operably linked to an N domain or functional fragment thereof, under 
conditions sufficient to allow the components to interact, then subsequently measuring 

10 the effect the composition has on hedgehog activity or expression. Fragments of 
hedgehog polypeptide or polynucleotide can be used in the method of the invention as 
long as autoproteo lytic activity remains (e.g., the construct exemplified in Figure 1 2a and 
12b, Example 10). The observed effect on hh may be either inhibitory or stimulatory. 
For example, one can determine whether the N domain is associated with the cell, or 

15 whether the N domain is secreted into the medium, in other words, whether incomplete 
processing has occurred. Such methods for determining the effect of the compound or 
composition on hh processing include those described herein (see Example 10, Figure 
12a and 12b) such as time course of autoproteo lytic cleavage or course of cleavage based 
on concentration ranges. Alternatively, the effect of the composition on hh can be 

20 determined by the expression of anterior or posterior neural markers. Other methods for 
determining the effect of a composition on processing of N and C will be known to those 
of skill in the art. Various labels can be used to detect the N and C domains, for 
example, a radioisotope, a fluorescent compound, a bioluminescent compound, a c- 
hemiluminescent compound, a metal chelator or an enzyme could be used. Those of 

25 ordinary skill in the art will know of other suitable labels or will be able to ascertain 
such, using routine experimentation. 

The identification of the lipid modification of the N domain of hedgehog by the C 
domain, resulting in a biologically active N domain, is useful in a screening method to 
identify compounds or compositions which affect the cholesterol transferase/processing 
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activity of hedgehog. In a broader aspect, the modification may be a general sterol or 
lipid modification, and not limited to cholesterol. Thus, in another embodiment, the 
invention provides a method for identifying a composition which affects hh biological 
activity, which can be determined by activity or lipid modification (e.g., cholesterol), 
5 comprising incubating the components, which include the composition to be tested {e.g., 
a drug, a small molecule, a protein) and a hh polypeptide or a recombinant cell 
expressing hedgehog or a gene encoding a C domain or functional fragment thereof 
operably linked to an N domain or functional fragment thereof, under conditions 
sufficient to allow the components to interact, then subsequently measuring the effect the 
10 composition has on hedgehog activity. Fragments of hedgehog polypeptide or 
polynucleotide can be used in the method of the invention as long as cholesterol 
transferase activity remains, for example. The effect on hh may be either inhibitory or 
stimulatory. For example, one can determine whether the N domain is associated with 
the cell, or whether the N domain is secreted into the medium, in other words, whether 
1 5 incomplete processing and modification has occurred. Such methods for determining the 
effect of the compound or composition on hh processing include those described herein 
(see Example 10, Figure 12a and 12b) such as time course of autoproteo lytic cleavage 
or course of cleavage based on concentration ranges. Alternatively, the effect of the 
composition on hh can be determined by the level of cholesterol modification as 
20 determined by thin layer chromatography (e.g., Example 1 9, Figure 23) or incorporation 
of labeled cholesterol into hh protein (e.g., Example 19, Figure 25) or into a fragment 
appended to the transferase (c) domain.. Other methods for determining the effect of a 
composition on processing and cholesterol modification of N and C will be known to 
those of skill in the art. Various labels can be used to detect the N and C domains, for 
25 example, a radioisotope, a fluorescent compound, a bioluminescent compound, a c- 
hemiluminescent compound, a metal chelator or an enzyme could be used. Those of 
ordinary skill in the art will know of other suitable labels or will be able to ascertain 
such, using routine experimentation. 
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As used Herein, u hh activity' as described in the screening method refers preferably to 
autoproteolytic activity. However, it is understood, that one of skill in the art could use 
the above-described screening assay to identify a composition having an affect on other 
hh activities, for example, zinc hydrolase activity or cholesterol transferase activity; or 
5 inductionor regulatoin of differentiation of neuronal cells or chondrocytes. Appropriate 
assays for determining the effect on such activities will be known to those of skill in the 
art. Example 19 provides lipophilic modification assays useful in the described screening 
methods above. 

Now that the present invention describes the cholesterol modification of N by C, it is 
10 possible to design various diagnostic and therapeutic approaches for treatment of hh 
associated disorders due to defective or altered sterol modification. For example, Smith- 
Lemli-Optiz syndrome (SLOS) is characterized by a loss of hh function and a sterol 
profile indicating a cholesterol deficiency. Therefore, SLOS may be diagnosed and/or 
treated based on the cholesterol profile. Further, a defect in Desert hh in the testes is 
15 associated with male sterility (M.Bitgood, L.Shen, A.P. McMahon, Current Biology 6, 
298, 1996; A. Vortkamp et al., Science 273, 613, 1996), consequently, it may be possible 
to design male contraceptives based on defective cholesterol modification of hh. On the 
other hand, if sterility or decreased fertility was desirable, hh cholesterol transferase 
activity could be altered to reduce cholesterol modification. Processing of the C and N 
20 fragments of hh is required for hh activity, therefore alterations in cholesterol m- 
odification of the amino terminal fragment may also be related to developmental defects 
in vertebrate embryos. 

Another aspect of the presentinvention concerns three-dimensional molecular models 
of the subject hedgehog proteins, and their use as templates for the design of agents able 
25 to inhibit or potentiate at least one biological activity of the hedgehog, particularly the 
autoproteolytic. An integral step to our approach to designing inhibitors of the subject 
hegehog proteins, for example, involves construction of computer graphics models of the 
hegehog protein which can be used to design pharmacophores by rational drug design. 
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For instance, for an inhibitor to interact optimally with the subject proteolytic domain of 
hedgehog, it will generally be desirable that it have a shape which is at least partly 
complimentary to that of a particular binding site of the enzyme, as for example those 
portions of the human hegehog protein which are involved in the autoproteolytic activity. 
5 Additionally, other factors, including electrostatic interactions, hydrogen bonding, 
hydrophobic interactions, desolvation effects, and cooperative motions of ligand and 
enzyme, all influence the binding effect and should be taken into account in attempts to 
design bioactive inhibitors, 

A computer-generated molecular model of the subject hedgehog proteins can be created. 

1 0 In preferred embodiments, at least the Ca-carbon positions of the hedgehog sequence of 
interest are mapped to a particular coordinate pattern, such as the coordinates for 
hedgehog determined by x-ray crystallography, by homology modeling, and the structure 
of the protein and velocities of each atom are calculated at a simulation temperature (T 0 ) 
at which the docking simulation is to be determined. Typically, such a protocol involves 

15 primarily the prediction of side-chain conformations in the modeled protein, while 
assuming a main-chain trace taken from a tertiary structure such as provided in x- 
crystallographic model described herein. Computer programs for performing energy 
minimization routines are commonly used to generate molecular models. For example, 
both the CHARMM (Brooks et al. (1983) J Comput Chem 4:187-217) and AMBER 

20 (Weineretal (1981) J. Comput Chem. 106: 765) algorithms handle all of the molecular 
system setup, force field calculation, and analysis (see also, Eisenfield et al. (1991) Am 
J Physiol 261 :C376-386; Lybrand (1991) J Pharm Belg 46:49-54; Froimowitz (1990) 
Biotechniques 8:640-644; Burbam et al. (1990) Proteins 7:99-111; Pedersen (1985) 
Environ Health Perspect 61:185-190; and Kini et al. (1991) J Biomol Struct Dyn 9:475- 

25 488). At the heart of these programs is a set of subroutines that, given the position of 
every atom in the model, calculate the total potential energy of the system and the force 
on each atom. These programs may utilize a starting set of atomic coordinates, such as 
the model coordinates provided in crysiallographic-derived models, the parameters for 
the various terms of the potential energy function, and a description of the molecular 
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topology (the covalent structure). Common features of such molecular modeling 
methods include: provisions for handling hydrogen bonds and other constraint forces; the 
use of periodic boundary conditions; and provisions for occasionally adjusting positions, 
velocities, or other parameters in order to maintain or change temperature, pressure, 
5 volume, forces of constraint, or other externally controlled conditions. 

Most conventional energy minimization methods use the input data described above and 
the fact that the potential energy function is an explicit, differentiable function of 
Cartesian coordinates, to calculate the potential energy and its gradient (which gives the 
force on each atom) for any set of atomic positions. This information can be used to 
10 generate a new set of coordinates in an effort to reduce the total potential energy and, by 
repeating this process over and over, to optimize the molecular structure under a given 
set of external conditions. These energy minimization methods are routinely applied to 
molecules similar to the subject hedgehog proteins as well as nucleic acids, polymers and 
zeolites. 

15 In general, energy minimization methods can be carried out for a given temperature, Tj, 
which may. be different than the docking simulation temperature, T 0 . Upon energy 
minimization of the molecule at Tj, coordinates and velocities of all the atoms in the 
system are computed. Additionally, the normal modes of the system are calculated. It 
will be appreciated by those skilled in the art that each normal mode is a collective, 

20 periodic motion, with all parts of the system moving in phase with each other, and that 
the motion of the molecule is the superposition of all normal modes. For a given 
temperature, the mean square amplitude of motion in a particular mode is inversely 
proportional to the effective force constant for that mode, so that the motion of the 
molecule will often be dominated by the low frequency vibrations. 

25 After the molecular model has been energy minimized at T- r the system is "heated" or 
"cooled" to the simulation temperature, T 0 , by carrying out an equilibration run where 
the velocities of the atoms are scaled in a step-wise manner until the desired temperature, 
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T 0 , is reached. The system is further equilibrated for" a specified period of time until 
certain properties of the system, such as average kinetic energy, remain constant. The 
coordinates and velocities of each atom are then obtained from the equilibrated system. 

Further energy minimization routines can also be carried out. For example, a second 
5 class of methods involves calculating approximate solutions to the constrained EOM for 
the protein. These methods use an iterative approach to solve for the Lagrange 
multipliers and, typically, only need a few iterations if the corrections required are small. 
The most popular method of this type, SHAKE (Ryckaert et al. (1977) J Comput Phys 
23:327; and Van Gunsteren et al. (1977) Mol Phys 34:131 1) is easy to implement and 
10 scales as 0(N) as the number of constraints increases. Therefore, the method is 
applicable to macromolecules such as the Hedgehog proteins of the present invention. 
An alternative method, RATTLE (Anderson (1983) J Comput Phys 52:24) is based on 
the velocity version of the Verlet algorithm. Like SHAKE, RATTLE is an iterative 
algorithm and can be used to energy minimize the model of the subject hedgehog protein. 

1 5 The increasing availability of biomacromolecule structures of potential pharmacophore 
molecules that have been solved crystallographically has prompted the development of 
a variety of direct computational methods for molecular design, in which the steric and 
electronic properties of catalytic and substrate recognition sites are use to guide the 
design of potential inhibitors (Cohen et al. (1990) 1 Med Cam. 33: 883-894; Kuntz et 
20 al. (1982) 1 Mol Biol 161: 269-288; DesJarlais (1988) 1 Med. Cam. 31: 722-729; 
Bartlettet aL (1989) (Spec. PubL, Roy. Soc. Chem.) 78: 182-196; Goodford et al. (1985) 
J. Med. Cam, 28: 849-857; DesJarlais et al. 1 Med Cam. 29: 2149-2153). Directed 
methods generally fall into two categories: (1) design by analogy in which 3-D 
structures of known molecules (such as from a crystallographic database) are docked to 
25 the enzyme structure and scored for goodness-of-fit; and (2) de novo design, in which the 
ligand model is constructed piece-wise in the enzyme. The latter approach, in particular, 
can facilitate the development of novel molecules, uniquely designed to bind to, and, e.g., 
inhibit the proteolytic activity of a hegehog protein. 
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In an illustrative embodiment, the design of potential hedgehog inhibitors begins from 
the general perspective of shape complimentary for the active site and substrate 
specificity subsites of the enzyme, and a search algorithm is employed which is capable 
of scanning a database of small molecules of known three-dimensional structure for 
5 candidates which fit geometrically into the target protein site. It is not expected that the 
molecules found in the shape search will necessarily be leads themselves, since no 
evaluation of chemical interaction necessarily be made during the initial search. Rather, 
it is anticipated that such candidates might act as the framework for further design, 
providing molecular skeletons to which appropriate atomic replacements can be made. 
10 Of course, the chemical complimentary of these molecules can be evaluated, but it is 
expected that atom types will be changed to maximize the electrostatic, hydrogen 
bonding, and hydrophobic interactions with the enzyme. Most algorithms of this type 
provide a method for finding a wide assortment of chemical structures that are 
complementary to the shape of a binding site of the subject enzyme. Each of a set of 
15 small molecules from a particular data-base, such as the Cambridge Crystallographic 
Data Bank (CCDB) (Allen et al. (1973) J, Chem. Doc. 13: 1 19), is individually docked 
to the binding site of the hedgehog proteolytic domain in a number of geometrically 
permissible orientations with use of a docking algorithm. In a preferred embodiment, a 
set of computer algorithms called DOCK, can be used to characterize the shape of 
20 invaginations and grooves that form the active sites and recognition surfaces of the 
subject protein (Kuntz et al. (1982) J. MoL Biol 161: 269-288). The program can also 
search a database of small molecules for templates whose shapes are complementary to 
particular binding sites of the enzyme (DesJarlais et al. (1988) J Med Chem 31 : 722- 
729). These templates normally require modification to achieve good chemical and 
25 electrostatic interactions (DesJarlais et al. (1989) ACS Symp Ser 413: 60-69). However, 
the program has been shown to position accurately known cofactors for inhibitors based 
on shape constraints alone. 

The orientations are evaluated for goodness-of-fit and the best are kept for further 
examination using molecular mechanics programs, such as AMBER or CHARMM. 
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Such algorithms have previously proven successful in finding a variety of molecules that 
are complementary in shape to a given binding site of a receptor-enzyme, and have been 
shown to have several attractive features. First, such algorithms can retrieve a 
remarkable diversity of molecular architectures. Second, the best structures have, in 
5 previous applications to other proteins, demonstrated impressive shape complementarity 
over an extended surface area. Third, the overall approach appears to be quite robust 
with respect to small uncertainties in positioning of the candidate atoms, 

Goodford (1985, J Med Chem 28:849-857) and Boobbyer et al. (1989, J Med Chem 
32:1083-1094) have produced a computer program (GRID) which seeks to determine 

10 regions of high affinity for different chemical groups (termed probes) on the molecular 
surface of the binding site. GRID hence provides a tool for suggesting modifications to 
known ligands that might enhance binding. It may be anticipated that some of the sites 
discerned by GRID as regions of high affinity correspond to "pharmacophore patterns" 
determined inferentially from a series of known ligands. As used herein, a 

1 5 pharmacophore pattern is a geometric arrangement of features of the anticipated ligand 
that is believed to be important for binding. Attempts have been made to use 
pharmacophore patterns as a search screen for novel ligands (Jakes et aL (1987) J Mol 
Graph 5:41-48; BrintetaL(1987) JMo/ Graph 5:49-56; Jakes et al. (1986) J Mol Graph 
4:12-20); however, the constraint of steric and "chemical" fit in the putative (and 

20 possibly unknown) receptor binding site is ignored. Goodsell and Olson (1 990, Proteins: 
Struct Funct Genet 8:195-202) have used the Metropolis (simulated annealing) algorithm 
to dock a single known ligand into a target protein. They allow torsional flexibility in 
the ligand and use GRID interaction energy maps as rapid lookup tables for computing 
approximate interaction energies. Given the large number of degrees of freedom 

25 available to the ligand, the Metropolis algorithm is time-consuming and is unsuited to 
searching a candidate database of a few thousand small molecules. 

Yet a further embodiment of the present invention utilizes a computer algorithm such as 
CLIX which searches such databases as CCDB for small molecules which can be 
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oriented in the receptor binding site in a way that is both sterically acceptable and has a 
high likelihood of achieving favorable chemical interactions between the candidate 
molecule and the surrounding amino acid residues. The method is based on characteriz- 
ing the receptor site in terms of an ensemble of favorable binding positions for different 

5 chemical groups and then searching for orientations of the candidate molecules that cause 
maximum spatial coincidence of individual candidate chemical groups with members of 
the ensemble. The current availability of computer power dictates that a computer-based 
search for novel ligands follows a breadth-first strategy. A breadth-first strategy aims 
to reduce progressively the size of the potential candidate search space by the application 

10 of increasingly stringent criteria, as opposed to a depth-first strategy wherein a 
maximally detailed analysis of one candidate is performed before proceeding to the next. 
CLIX conforms to this strategy in that its analysis of binding is rudimentary -it seeks to 
satisfy the necessary conditions of steric fit and of having individual groups in "correct" 
places for bonding, without imposing the sufficient condition that favorable bonding 

15 interactions actually occur. A ranked "shortlist" of molecules, in their favored 
orientations, is produced which can then be examined on a molecule-by-molecule basis, 
using computer graphics and more sophisticated molecular modeling techniques. CLEX 
is also capable of suggesting changes to the substituent chemical groups of the candidate 
molecules that might enhance binding. 

20 The algorithmic details of CLIX is described in Lawerence et al. (1 992) Proteins 12:31- 
41, and the CLIX algorithm can be summarized as follows. The GRID program is used 
to determine discrete favorable interaction positions (termed target sites) in the binding 
site of the protein for a wide variety of representative chemical groups. For each 
candidate ligand in the CCDB an exhaustive attempt is made to make coincident, in a 

25 spatiaLsense- in the binding site of the protein, a pair of the candidate's substituent 
chemical groups with a pair of corresponding favorable interaction sites proposed by 
GRID. All possible combinations of pairs of ligand groups with pairs of GRID sites are 
. considered during this procedure. Upon locating such coincidence, the program rotates 
the candidate ligand about the two pairs of groups and checks for steric hindrance and 
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coincidence of other candidate atomic groups with appropriate target sites. Particular 
candidate/orientation combinations that are good geometric fits in the binding site and 
show sufficient coincidence of atomic groups with GRID sites are retained. 

Consistent with the breadth-first strategy, this approach involves simplifying assump- 
5 tions. Rigid protein and small molecule geometry is maintained throughout. As a first 
approximation rigid geometry is acceptable as the energy minimized coordinates of the 
hedgehog deduced structure, describe an energy minimum for the molecule, albeit a local 
one. If the surface residues of the site of interest are not involved in crystal contacts then 
the crystal configuration of those residues. We believe that the deduced crystal structure 
10 described in herein should reasonably mimic the mean solution configuration. 
Moreover, the equivalent models of of hedgehog isoforms (Ihh, Dhh, etc) can be 
derived by the same method. 

A further assumption implicit in CLIX is that the potential ligand, when introduced into 
the active site of hegehog protein, does not induce change in the protein's stereochemistry 

15 or partial charge distribution and so alter the basis on which the GRID interaction energy- 
maps were computed. It must also be stressed that the interaction sites predicted by 
GRID are used in a positional and type sense only, i.e., when a candidate atomic group 
is placed at a site predicted as favorable by GRID, no check is made to ensure that the 
bond geometry, the state of protonation, or the partial charge distribution favors a strong 

20 interaction between the protein and that group. Such detailed analysis should form part 
of more advanced modeling of candidates identified in the CLIX shortlist. 

Yet another embodiment of a computer-assisted molecular design method for identifying 
inhibitors of the subject hegehog protein comprises the de novo synthesis of potential 
inhibitors by algorithmic connection of small molecular fragments that will exhibit the 
25 desired structural and electrostatic complementarity with the active site of the enzyme. 
The methodology employs a large template set of small molecules with are iterativeiy 
pieced together in a model of the hedgehog active site. Each stage of ligand growth is 
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evaluated according to a molecular mechanics-based energy function, which considers 
van der Waals and coulombic interactions, internal strain energy of the lengthening 
ligand, and desolvation of both ligand and enzyme. The search space can be managed 
by use of a data tree which is kept under control by pruning according to the binding 
5 criteria. 

In an illustrative embodiment, the search space is limited to consider only amino acids 
and amino acid analogs as the molecular building blocks. Such a methodology generally 
employs a large template set of amino acid conformations, though need not be restricted 
to just the 20 natural amino acids, as it can easily be extended to include other related 

10 fragments of interest to the medicinal chemist, e.g. amino acid analogs. The putative 
ligands that result from this construction method are peptides and peptide-like 
compounds rather than the small organic molecules that are typically the goal of drug 
design research. The appeal of the peptide building approach is not that peptides are 
preferable to organics as potential pharmaceutical agents, but rather that: (1) they can be 

15 generated relatively rapidly de novo; (2) their energetics can be studied by well- 
parameterized force field methods; (3) they are much easier to synthesize than are most 
organics; and (4) they can be used in a variety of ways, for peptidomimetic inhibitor 
design, protein-protein binding studies, and even as shape templates in the more 
commonly used 3D organic database search approach described above. ^ 

20 Such a de novo peptide design method has been incorporated in a software package 
called GROW (Moon et al. (1991) Proteins 1 1 :3 14-328). In a typical design session, 
standard interactive graphical modeling methods are employed to define the structural 
environment in which GROW is to operate. For instance, environment could be the 
active site cleft of hedgehog, or it could be a set of features on the protein's surface to 

25 which the user wishes to bind a peptide-like molecule, a peptide sequence based on the 
cleavage site of hedgehog itself (e.g., to represent the autoproteolytic event). The 
GROW program then operates to generate a set of potential ligand molecules. Interactive 
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modeling methods then come into play again, for examination ofthe resulting molecules, 
and for selection of one or more of them for further refinement. 
To illustrate, GROW operates on an atomic coordinate file generated by the user in the 
interactive modeling session, such as the coordinates provided in the ciystallographic- 
5 derived models, plus a small fragment (e.g., an acetyl group) positioned in the active site 
to provide a starting point for peptide growth. These are referred to as "site" atoms and 
"seed" atoms, respectively. A second file provided by the user contains a number of 
control parameters to guide the peptide growth (Moon et al. (1991) Proteins 1 1 :3 14-328). 

The operation of the GROW algorithm is conceptually fairly simple. GROW proceeds 

10 in an iterative fashion, to systematically attach to the seed fragment each amino acid 
template in a large preconstructed library of amino acid conformations. When a template 
has been attached, it is scored for goodness-of-fit to the receptor site, and then the next 
template in the library is attached to the seed. After all the templates have been tested, 
only the highest scoring ones are retained for the next level of growth. This procedure 

1 5 is repeated for the second growth level; each library template is attached in turn to each 
of the bonded seed/amino acid molecules that were retained from the first step, and is 
then scored. Again, only the best of the bonded seed/dipeptide molecules that result are 
retained for the third level of growth. The growth of peptides can proceed in the N-to-C 
direction only, the reverse direction only, or in alternating directions, depending on the 

20 initial control specifications supplied by the user. Successive growth levels therefore 
generate peptides that are lengthened by one residue. The procedure terminates when the 
user-defined peptide length has been reached at which point the user can select from the 
constructed peptides those to be studied further. The resulting data provided by the 
GROW procedure include not only residue sequences and scores, but also atomic 

25 coordinates of the peptides, related directly to the coordinate system of the receptor site 
atoms. 

In yet another embodiment, potential pharmacophore compounds can be determined 
using a method based on an energy minimization-quenched molecular dynamics 
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algorithm for determining energetically favorable positions of functional groups in the 
- binding sites of the subject hegehog protein. The method can aid in the design of 
molecules that incorporate such functional groups by modification of known ligands or 
de novo construction. 

5 For example, the multiple copy simultaneous search method (MCSS) described by 
Miranker et al. (1991) Proteins 1 1 : 29-34. To determine and characterize a local minima 
of a functional group in the forcefield of the protein, multiple copies of selected 
functional groups are first distributed in a binding site of interest on the hedgehog 
protein. Energy minimization of these copies by molecular mechanics or quenched 

10 dynamics yields the distinct local minima. The neighborhood of these minima can then 
be explored by a grid search or by constrained minimization. In one embodiment, the 
MCSS method uses the classical time dependent Hartee (TDH) approximation to 
simultaneously -minimize or quench many identical groups in the forcefield of the 
protein. 

15 Implementation of the MCSS algorithm requires a choice of functional groups and a 
molecular mechanics model for each of them. Groups must be simple enough to be 
easily characterized and manipulated (3-6 atoms, few or no dihedral degrees of freedom), 
yet complex enough to approximate the steric and electrostatic interactions that the 
functional group would have in binding to the site of interest in the hedgehog protein. 

20 A preferred set is, for example, one in which most organic molecules can be described 
as a collection of such groups (PataVs Guide to the Chemistry of Functional Groups, ed. 
S. Patai (New York: John Wiley, and Sons, (1989)). This includes fragments such as 
acetonitrile, methanol, acetate, methyl ammonium, dimethyl ether, methane, and 
acetaldehyde. 

25 Determination of the local energy minima in the binding site requires that many starting 
positions be sampled. This can be achieved by distributing, for example, 1,000-5,000 
groups at random inside a sphere centered on the binding site; only the space not 
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occupied by the protein needs to be considered. If the interaction energy of a particular 
group at a certain location with the protein is more positive than a given cut-off (e.g. 5.0 
kcal/mole) the group is discarded from that site. Given the set of starting positions, all 
the fragments are minimized simultaneously by use of the TDH approximation (Elber et 

5 al. (1990) J Am ChemSoc 1 12: 9161-9175). In this method, the forces on each fragment 
consist of its internal forces and those due to the protein. The essential element of this 
method is that the interactions between the fragments are omitted and the forces on the 
protein are normalized to those due to a single fragment. In this way simultaneous 
minimization or dynamics of any number of functional groups in the field of a single 

1 0 protein can be performed. 

Minimization is performed successively on subsets of, e.g. 100, of the randomly placed 
groups. After a certain number of step intervals, such as 1 ,000 intervals, the results can 
be examined to eliminate groups converging to the same minimum. This process is 
repeated until minimization is complete (e.g. RMS gradient of 0.01 kcal/mole/ A). Thus 
15 the resulting energy minimized set of molecules comprises what amounts to a set of 
disconnected fragments in three dimensions representing potential pharmacophores. 

The next step then is to connect the pharmacophore pieces with spacers assembled from 
small chemical entities (atoms, chains, or ring moieties). In a preferred embodiment, 
each of the disconnected can be linked in space to generate a single molecule using such 

20 computer programs as, for example, NEWLEAD (Tschinke et al. (1993) J Med Chem 36: 
3863,3870). The procedure adopted by NEWLEAD executes the following sequence of 
commands (1) connect two isolated moieties, (2) retain the intermediate solutions for 
further processing, (3) repeat the above steps for each of the intermediate solutions until 
no disconnected units are found, and (4) output the final solutions, each of which is 

25 single molecule. Such a program can use for example, three types of spacers: library 
spacers, single-atom spacers, and fuse-ring spacers. The library spacers are optimized 
structures of small molecules such as ethylene, benzene and methylamide. The output 
produced by programs such as NEWLEAD consist of a set of molecules containing the 
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original fragments now connected by spacers. The atoms belonging to the input 
fragments maintain their original orientations in space. The molecules are chemically 
plausible because of the simple makeup of the spacers and functional groups, and 
energetically acceptable because of the rejection of solutions with van-der Waals radii 
5 violations. 

The three-dimensional structure of hedgehog is usefiil to aid in screening and develop- 
- ment of diagnostic and therapeutic protein fragments as in rational drug design, to search 
for structural analogs of known protein structures, or to aid in an analysis of biological 
function and activity. Also, the method may be used to predict protein secondary 
10 structures and protein subsecondary structures from amino acid sequences alone, and to 
predict those regions of a protein molecule that are on the outside and those that are on 
the inside. 

Compounds can also be prepared using the three-dimensional structure provided herein 
and tested using assays known to those of skill in the art. For example, compounds can 
15 be synthesized and screened for hedgehog autoproteolytic activity by cleavage assays 
(see for example, Porter et al^ Cell 86:21, 1996; W096/17924, herein incorporated by 
reference). 

Compounds of the invention include drugs, small molecules, peptides, peptidomimetics, 
polypeptides, chemical compounds and biologic agents. For example, peptidomimetics 

20 are synthetic compounds having a three-dimensinal structure (i.e., a "peptide motif 1 ) 
based upon the three-dimensional structure of a selected peptide. The peptide motif 
provides the peptidomimetic compound with Hedgehog agonist or antagonist activity that 
is substantially the same as, or greater than, the Hedgehog agonist or antagonist activity 
of the peptide from which the peptidomimetic was derived. Peptidomimetic compounds 

25 can have additional characteristics that enhance their therapeutic application, e.g., 
enhanced cell permeability, increased receptor or polypeptide binding affinity and/or 
avidity, and prolonged biological half-life. The design of peptidomimetic compounds 
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having agonist or'antagonist activity can be aided through computermodeling techniques 
well known in the art. Other methods for the design, as well asihe preparation of ? p- 
eptidomimemtic compounds are well known in the art. 

Atomic coordinates and structure factors have been deposited in the Brookhaven Protein 
5 Data Bank. Applicant assures complete access and disclosure of these coordinates and 
factors upon issuance of a patent. 

The following examples are intended to illustrate but not limit the invention. While they 
are typical of those that might be used, other procedures known to those skilled in the art 
may alternatively be used. 

10 EXAMPLE 1 

HEDGEHOG PROTEIN PROCESSING 

The full length form of the hh protein (F) migrates with a mobility corresponding to a 
relative molecular mass of 46 kD. FIGURES 1 (A) and (C) are immunoblots with 
antibodies against amino- (Abl) and carboxy-terminal (Ab2) epitopes. GST fusion 

15 proteins containing either residues 83 to 160 or 300 to 391 from HH protein were 
expressed in Escherichia coli, purified as recommended [F. M. Ausubel, et ai. Current 
Protocols in Molecular Biology (Greene and Wiley-Interscience, New York, 1 99 1 )] , and 
used to immunize rabbits by standard methods. The antibodies were affinity purified on 
a column of His 6 -U protein [E. Harlow and D. Lane, Antibodies: A Laboratory Manual 

20 (Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1988)] linked to Affi-Gel 1 0 
beads (Bio-Rad). The purification was performed as described (Harlow and Lane, supra) 
except that the acid and base elutions contained 10 percent dioxane. Biotinylated hh 
antibodies were prepared by purifying the rabbit antisera over a protein A column, 
followed by biotinylation with the use of the Immunoprobe biotinylation kit (Sigma). 

25 Immunoprecipitations were performed as described [Harlow and Lane] with the use of 
cold RIPA lysis buffer containing 0.25 mM phenylmethylsulfonyl fluoride (PMSF) and 
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5 mM EDTA for tissue homogenization. Lysates were precleared twice with pre-immune 
rabbit serum plus protein A beads (Gibco-BRL). Affinity-purified antibodies or 
preimmune serum was then added, and the immunoprecipitation was performed with 
protein A beads, with the use of NP-40 lysis buffer for the washes. 

5 Immunoblots were performed with affinity purified Abl or Ab2 by either of two 
chemiluminescence based protocols. In the first protocol (used in Figures 1, 3, and 5) 
samples were resolved on 15 percent or 12 percent SDS-polyacrylamide gels (F. M. 
Ausubei et al., supra) and transferred to Magnagraph nylon membranes (MSI) by 
electroblotting. Blots were developed with the use of an alkaline phosphatase conjugated 

1 0 donkey anti-rabbit IgG secondary antibody and Lumi-Phos 530 (Boehringer Mannheim) 
under recommended conditions. In the second protocol (used in FIGURE 8), samples 
were transferred to nitrocellulose filters (Schleicher and Schuell), and blots were 
developed using ECL reagents (Amersham) as recommended. The secondary antibody 
in this case was horseradish peroxidase conjugated goat anti-rabbit IgG (Jackson I- 

15 mmunoResearch). Lanes contain protein from induced untransfected S2 cells (lanes 1 
and 13), transfected S2 cells induced to express hh (lanes 2 and 14), imaginal discs (lanes 
3 and 15), wild type embryos (lanes 6 and 18), and in vitro translations of synthetic h 
mRNA both in the presence (lanes 5 and 17) and absence of microsomes (lanes 4 and 
16). 

20 cDNAs encoding various hh protein species were cloned into the pMK33 vector,. which 
allows for inducible expression under metallothionein promoter control (M. R. Koeile 
et al., Cell 67:59,1991). Stable S2 ceil lines were made by transfection of the hh/pMK33 
piasmids with constant selection for hygromycin resistance. Proteins were expressed by 
plating a log phase culture of cells diluted to 0.1 A 595 units, waiting 48 hours, inducing 

25 with CuS0 4 at 0.2 mM final concentration, and harvesting the cells and/or supernatant 
24 hours later. Cell samples for immunoblotting were made by adding 10 volumes of IX 
SDS PAGE loading buffer to pelleted cells. 
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In 'vitro translations were performed with the use of the TNT coupled transcrip- 
tion-translation system (Promega). 35 S methionine (DuPont NEN) was used for detection 
by autoradiography. In the heparin binding experiment in vitro translation lysate with 
microsomes that produce wild-type hh protein was added to heparin agarose (Sigma) or 

5 Sepharose CL-4B (Pharmacia) beads pre-equilibrated with heparin binding buffer (HBB; 
20 mM Tris (7.4), 1 50 mM NaCl, 0.1 percent Triton X-l 00). Samples were incubated at 
4° C for four hours with gentle rocking. After pelleting the beads, supernatants in some 
samples were analyzed (lanes 2 and 4). The beads were then washed 5 times with chilled 
HBB and samples (lanes 3 and 5) were subsequently eluted at 80° C for 10 minutes in 

10 SDS PAGE loading buffer (F. M. Ausubel et al., supra). 

Embryos from the wild-type Canton-S line and from the matings, hshh/hshh or hshh 
H329A/hs/j/i H329A X v; Sco/CyO, enlacZIl::wg (Kassis, et aL Proc. Natl, Acad ScL 
U.S.A. 89: 1919, 1992), were collected 0 to 16 hours after egg laying (AEL) at 25° C. 
They were heat shocked for 30 minutes at 37° C and allowed to recover for 1 hour at 25 ° 
15 C. Embryos in FIGURE 1 (Canton-S) were collected 4 to 8 hours AEL at 25° C In 
preparation for immunoblotting, all embryos were dechorionated in 2.6 percent sodium 
hypochlorite and homogenized in 10 volumes of IX SDS PAGE loading buffer. 

Multiple species were detected and minor cross reactive bands are seen in most samples 
including extracts of induced untransfected S2 cells (lanes 1 and 13). One of these bands 
20 (occurring in both panels) co-migrates with U (at 39 kD) and is particularly abundant in 
lane 6 of FIGURE 1 (A). 

FIGURES 1 (B) and (D) are blots of samples immunoprecipitated with Abl (B, lanes 
7-9), Ab2 (D, lanes 19-21), orpre-immune serum (B, lanes 10-12 and D, lanes 22-24). 
Detection was with biotinylated derivatives of Abl (B) and Ab2 (D). Samples used were: 
25 induced untransfected S2 cells, lanes 7, 10, 19 and 22; transfected S2 cells induced to 
express hh 9 lanes 8, 11, 20 and 23; and embryos, lanes 9, 12, 21 and 24. For either 
antibody, hh protein fragments were specifically immunoprecipitated from hh expressing 
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cells and embryos, but not from untransfected cells. (E) In the schematic diagram, 
cleavage sites are denoted by arrows. The cleavage site marked by the asterisk is inferred 
by identification of only one cleavage product and may therefore occur at another 
location within the C fragment. The first two columns to the right of the diagram indicate 
5 the reactivity of Abl and Ab2 to each hh fragment. The other columns indicate the 
presence (+) or absence (-) of each hh fragment in the various samples. Parentheses 
around F and N ss indicate that these species are detected in in vitro translation reactions 
but not in vivo. 

The 46kD species was detected from in vitro translation extracts by Abl and Ab2 

1 0 (FIGURE 1, lanes 4 and 16), and was partially converted to a species of 39 kD (U) when 
translation occurred in the presence of microsomes (FIGURE I, lanes 5 and 17). A 39 
kD species co-migrating with U is also present in extracts from all in vivo sources, but 
none of these extracts contain detectable levels of F. U represents the signal-cleaved 
form of F; signal cleavage thus appears to be relatively inefficient in vitro, as reported 

15 previously, (J. J. Lee. etaL, Cell, 71:33, 1992), but is highly efficient in vivo. To confirm 
that signal cleavage indeed is occurring at this unusual internal location, a mutation that 
changes residue to N at the predicted signal cleavage site was introduced. This 
mutation prevented conversion by microsomes of F to U and also produced a species that 
comigrated with F upon transfection into cultured S2 cells. The effects of independently 

20 mutating the two methionine codons present upstream of the signal sequence were also 
examined. In vitro translation of the sequence in which the first methionine is removed 
produces a protein species intermediate in mobility between F and U, and this species is 
converted to a species that comigrates with U in the presence of microsomes or when 
produced in vivo. Alteration of the second methionine codon caused no change in the 

25 electrophoretic mobility of Hh protein produced in vivo or in vitro. 

Smaller species of Hh proteins from in vivo sources have been reported previously (T. 
Tabata and T. B. Kornberg, Cell 76: 89, 1994). The latter study examined not 
endogenous proteins, but proteins induced to express at high levels from exogenously 
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introduced constructs. The antibody used did not distinguish epitopes from distinct 
portions of the molecule. 

In addition to signal cleavage, a further cleavage of the U precursor is responsible for 
generating other forms of hh protein observed in vivo. This was deduced from the 

5 observation that Abl and Ab2 both detected the U (uncleaved) species, but also 
interacted individually with smaller protein species expressed endogenously in embryos 
and imaginal discs or with species expressed upon introduction of the hh gene into S2 
cells. Abl thus interacts with a 19kD species from all of these tissues (FIGURE 1. lanes 
2, 3, 6, 8, 9), while Ab2 interacts with a 25 kD species and a 16 kD species (FIGURE 1, 

10 lanes 14, 15, 18, 20, 21). The 19 kD species hereafter is referred to as N (N-terminal 
fragment), the 25 kD species as C (C-terminal fragment) and the 16 kD species as C*; 
these species represent the major forms of endogenous hh protein present in vivo. 

The proposed cleavages by which these species arise are shown schematically in the 
bottom portion of FIGURE 1. The N and C species are uniquely detected by Abl and 

15 Ab2, respectively, and the sum of the relative masses of the two smaller species is 
roughly equivalent to the relative mass of U. The electrophoretic mobilities of the F and 
U species are somewhat at variance with their predicted relative masses (52.1 kD and 
43.3 kD, respectively). The identities of these species were confirmed by in vitro 
translation of a variety of hedgehog open reading frames modified to contain different 

20 extents of sequence at the NH 2 - or COOH- terminus, and by insertion of epitope tags. 
The migration anomalies appear to be associated with protein species in which sequences 
from both the NH 2 - and COOH-terminal fragments are simultaneously present. The 
mobilities of the NH : - and COOH-terminal fragments, in contrast, correspond to relative 
masses (19 kD and 25 kD, respectively) that sum to yield 44 kD, roughly equivalent to 

25 the expected relative mass of U. 

A simple mechanism that could account for the derivation of the two smaller species 
therefore would be a single internal cleavage of the U precursor. Processing of the hh 
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protein when translated in vitro also yields a 25 kD species (C; lanes 16 and 17) and 
either a 29 kD or 1 9 kD (N) species (lanes 4 and 5). The 1 9 kD species comigrates with 
N, and its formation depends upon the presence of microsomes, consistent with the 
proposal that N derives from F by signal cleavage and a further internal cleavage. The 

5 overall pathway for formation of the predominant forms of hh protein observed in vivo 
thus appears to involve signal cleavage of F to generate U. U is then cleaved internally 
to form N and C, which are the predominant forms found in vivo. Further processing of 
the 25 kD C species might then generate the 16 kD C* species, but whether this 
processing is a single cleavage event or not is not clear since Ab2 does not recognize the 

10 smaller 9 kD fragment that would result. The processing of C to generate C* appears to 
occur with greater efficiency in imaginal discs as compared to embryos (compare lanes 
15 and 18); this may be caused by the more extended mass isolation procedure of 
imaginal discs (0. M. Eugene, et al. y Tissue Culture Assn. Man., 5: 1055, 1979). 

EXAMPLE 2 

15 AUTO-PROTEOLYSIS OF THE HEDGEHOG PROTEIN 

The comigration of endogenous and in v/Vro-generated hh protein species suggested that 
in vitro processing is similar to that observed in vivo. FIGURE 2 shows limited sequence 
similarity between hh proteins and serine proteinases, hh protein sequences are aligned 
to residues 323 to 329 of the D. melanogaster protein and numbered as positions 1 to 7 
20 (group A). Conserved hh residues are in bold letters. The catalytic histidines (A. J. 
Barrett, in Proteinase inhibitors A. J. Barrett, G. Salvesen, Eds. (Elsevier, Amsterdam, 
1986) pp. 3-22) of mammalian serine proteinases (group B) are aligned to the invariant 
histidine at position 7 in Hh proteins. Abbreviations are as follows: C-S/z/z, chicken Sonic 
hh (R. D. Riddle, et ai, Cell 75: 1401, 1993); M-S/z/z, mouse Sonic hh (Y. Echelard et 
25 aU Cell 75; 1417, 1993) (identical to Hhg-U R v/z/z-1, rat vhh-\ (H. Roelink et aL Cell 
76: 761, 1994); Z-S/tft, zebrafish Sonic hh (S. Krauss, et ai, Cell 75: 1431, 1993) 
(identical to shh) and zebrafish v/z/z-1, (H. Roelink et al., supra); twhh, no other 
abbreviation; M-DWi, mouse Desert hh (Y. Echelard et aL, Cell 75: 1417, 1993); M-I/z/z, 
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mouse Indian hh (Y. Echelard et al., supra); CHT, bovine chymotrypsin; TRP. bovine 
trypsin; ELA, porcine elastase; UKH, human urokinase; C1R, human complement factor 
1R; CIS, human complement factor IS; MCP, rat mast cell protease; FAX, human blood 
clotting factor X; TPA. human tissue plasminogen activator. 

5 Figure 2 shows that a seven residue region of hh coding sequence (residues 323 to 329 
in the Drosophila protein) displays some similarity to the sequences of serine proteases. 
This region lies approximately two thirds of the distance from the signal cleavage site to 
the carboxy-terminus, and includes Thr and His, residues (positions 4 and 7 in FIGURE 
2) that are invariant among all hh sequences from all species. In the serine proteases, this 

10 conserved sequence contains an invariant His that acts as a general base in catalysis (A. 
J. Barrett, in Proteinase inhibitors A. J. Barrett, G. Salvesen, Eds. Elsevier. Amsterdam, 
1986, pp. 3-22). 

To determine whether this invariant His residue in the hh protein indeed plays a role in 
auto-proteolysis, two proteins from E. coli were purified: one carried the wild type 

15 sequence and the other a substitution of an Ala codon for the His codon at position 329 
(H329A). Both of these proteins were engineered to contain a hexa-histidine tag at the 
amino terminus fused to Drosophila sequences extending from a residue just before the 
signal cleavage site to the carboxy-terminus (residues 83 to 471; the wild type form of 
this protein is referred to as His 6 -U). Both proteins were extensively purified under 

20 denaturing conditions using a -chelating matrix. FIGURE 3(A) is a coomasie blue 
stained polyacrylamide gel that shows the production and purification of His 6 -U and 
His 6 -U H329A proteins from £ colL Samples were molecular weight markers (lanes 1 and 
2); lysates of £ coli cells carrying the His 6 -U expression construct without (lane 3) and 
with (lane 4) induction by IPTG; purified His 6 -U protein (lane 5); lysates of E. coli cells 

25 that carry the His 6 -U H3:9A expression construct without (lane 6) and with (lane 7) 
induction by IPTG; purified Hise-U^A protein (lane 8). Purified proteins were 
essentially homogeneous except for several minor species of lower relative mass; these 
species are endogenous breakdown products of the full-length proteins since they were 



WO 98/30576 



PCT/US97/15753 



- 67 - 

absent in uninduced extracts and were detectable with hh antibodies. FIGURE 3 (B) is 
an immunoblot detected with Ab2 showing transfected S2 cells induced to express hh 
(lane 1); His 6 -U and Hi§ -I^a proteins incubated in cleavage reaction buffer for 0 
hours (lanes 2 and 5), for 20 hours (lanes 3 and 6), and for 20 hours in the presence of 
5 20 mM TAME (a serine protease inhibitor) (lanes 4 and 7). Upon incubation the His 6 -U, 
but not the His 6 -U H32 9 A protein, released a fragment presumed to be C on the basis of 
reactivity with Ab2 and co-migration with C produced in S2 cells. Release of C (lane 3) 
was only partially inhibited by TAME. 

Preliminary proteinase inhibitor studies have been performed on in vitro translated Hh 
10 protein by adding various inhibitors at the start of the translation reaction. These studies 
have been complicated by the fact that numerous protease inhibitors lower or block 
translation efficiency. In some cases the effectiveness of an inhibitor was assayed by 
determining if addition of an inhibitor to a completed translation reaction will inhibit the 
self-processing that normally continues to occur. At this time we can only state the 
15 following with certainty: (i) the serine protease inhibitor TAME (- 
p-toluenesulfonyl-L-arginine methyl ester) inhibits auto-proteolysis of in-vitro translated 
Hh protein; (ii) soybean trypsin inhibitor, a, anti-trypsin, aprotitin, Ieupeptin, and E-64 
do not block auto-proteolysis of translated Hh protein; and (iii) TAME partially inhibits 
auto-proteolysis of purified His 6 -U protein (FIGURE 3, panel B). 

20 As seen in FIGURE 3B, upon dilution of denaturant the wild type protein but not the 
H329A mutant protein released a 25 kD species detectable by Ab2 and identical in 
mobility with the C species produced from in vitro translations and various in vivo 
sources. This cleavage was also observed when the wild type protein was purified and 
renatured by other protocols and cleaved under distinct conditions. Plasmids encoding 

25 the His 6 -U and His6-U H329A proteins were generated by inserting sequences correspond- 
ing to residues 83 to 471 from the wild-type or hh H329A ORF into the pRSETB 
expression vector (Invitrogen). Proteins were induced in BL21(DE3)/pLysS E. coli cells 
as described (F. M. Ausubel et al., supra). The basic purification was performed on 



WO 98/30576 



PCT/US97/15753 



- 68 - 

Ni-NTA agarose beads (Qiagen) by a denaturing protocol with the use of 6 M 
guanidinium HC1 and 8 M urea essentially as recommended (a detailed protocol of exact 
conditions used is available upon request). Washes contained 0.2 percent Tween 20 and 
5 mM b-mercaptoethanol. The final wash buffer was: 6 M urea, 100 mM Tris, 500 mM 

5 NaCl, 20 percent glycerol. (pH 7.4). Elutions were with the final wash buffer containing 
250 mM imidazole. In vitro cleavage reactions were performed by incubating the purified 
protein (diluted 1 :30 in the final mix) in cleavage buffer [50 mM Tris, 500 mM NaCl, 5 
percent glycerol, 0.2% Triton X-100, 50 mM DTT, (pH 7.4)]. To isolate soluble 
full-length His 6 -U protein free from denaturants or detergents, additional steps were 

10 taken (this refers to the other renaturation protocols mentioned in the text). Full-length 
protein from the eluate described above was further purified from breakdown products 
by precipitation, by urea removal through dialysis. The precipitate was then 
re-solubilized in a buffer containing guanidinium HC1 and loaded onto another Ni-NTA 
agarose column. After washing as described, the protein was re-folded (while attached 

15 to the beads) by gradual dilution of urea (from 6M to 0.5M) with dilution buffer [(100 
mM Tris, 500 mM NaCl, 20 percent glycerol, (pH 7.4)] over an 8 hour period at 4° C. 
The protein .was eluted with dilution buffer containing 250 mM imidazole and 0.5M urea. 
The eluate was dialyzed in 100 mM Tris, 150 mM NaCl, 10 percent glycerol, (pH 7.4) 
at 4° C and stored at -70° C. 



20 EXAMPLE 3 

MAPPING THE AUTO-PROTEOLYTIC FUNCTIONS OF hh 

To more precisely define the domain of the hh protein responsible for this auto- 
proteolytic event, the effects of several distinct types of mutations upon in vitro 
processing were examined. The most informative mutation was a deletion that removes 
25 residues 89 to 254 (A89-254), which together constitute most of the amino acids within 
the portion of the molecule presumed to form the N fragment. In vitro translations of 
wild-type and mutant Hh proteins from Drosophila (FIGURES 4 A-C) and zebrafish 



WO 98/30576 



PCT/US97/15753 



- 69 - 

(FIGURE 4D) are shown. The locations of mutations and cleavage sites (arrows) in these 
proteins are schematically illustrated (FIGURE 4E). In the Drosophila protein 
(FIGURES 4A, B, and C), auto-proteolysis is blocked or severely inhibited by several 
mutations in the COOH-terminus (H329A, 294 trunc, 410 trunc, flu408 and 456 trunc), 

5 but is unaffected by a large deletion (A89-254) or insertion of a flu-tag epitope trimer 
(flu227) in the NH r terminus. Auto-proteolysis thus depends primarily on residues within 
the C fragment (sequences to the right of the cleavage site in the diagram below; see 
' FIGURE 1). Furthermore, the H329A/flu227 double mutant is not cleaved by wild-type 
protein in a mixing experiment (lane 11), suggesting an intramolecular mechanism for 

10 auto-proteolysis. Hh proteins encoded by the zebrafish genes twhh and shh display a 
pattern of processing (D) similar to that of the Drosophila protein although the 
NH,-terminal fragment of each zebrafish protein (23 kD for twhh and 22 kD for shh) has 
a lower apparent mass than the COOH-terminal fragment (25 kD for twhh and shh). This 
is the result of a shorter stretch of residues that precedes the signal sequences as 

15 compared to the Drosophila protein. Processing is blocked by H273A and H270A 
mutations in twhh and shh proteins respectively (analogous to the H329A mutation in the 
Drosophila protein), which suggests an auto-proteolytic processing mechanism is used 
similar to that observed for the Drosophila protein. 

In vitro translations were performed with the use of the TNT coupled 
20 transcription-translation system (Promega). 35 S methionine (DuPont NEN) was used for 
detection by autoradiography. In the heparin binding experiment (FIGURE 8C), in vitro 
translation lysate with microsomes that produce wild-type Hh protein was added to 
heparin agarose (Sigma) or Sepharose CL-4B (Pharmacia) beads pre-equilibrated with 
heparin binding buffer (HBB; 20 mM Tris (7.4), 150 mM NaCl, 0.1 percent Triton 
25 X-100). Samples were incubated at 4° C for four hours with gentle rocking. After 
pelleting the beads, supernatants in some samples were analyzed (lanes 2 and 4). The 
beads were then washed 5 times with chilled HBB and samples (lanes 3 and 5) were 
subsequently eluted at 80° C for 10 minutes in SDS PAGE loading buffer (F. M. Ausubel 
et al., supra). 
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All mutations in the hh gene were generated in the plasmid pFl (J. J. Lee, et ai, supra). 
Mutations in the zebrafish twhh and shh genes were generated with the original cDNA 
clones as described (Ekker, et a/., Current Biology, 5(8) : 944 ; 1 995). All point mutations 
were generated with the use of recombinant circle PCR (D. H. Jones and S. C. 

5 Winistorfer, Biotechniques 12: 528, 1992). The flu408 and flu227 mutations were 
generated by inserting a trimer of the influenza hemagglutinin antigen (42 residues for 
flu408 and 43 residues for flu227) into the AlwN I and Bgl I sites present in the hh ORF 
(nucleotide positions 1604 and 1058 respectively) (J. J. Lee, et al., supra). The A89-254 
mutation was generated by removing sequences between the EcoN I site (644) and the 

10 Pml I site (1 145). The 294 trunc mutation was generated by removing sequences between 
the Acc I site (1265) and the Xcm I site (1792). The 410 trunc mutation was previously 
generated and identified as Hh m (J. J. Lee, et ai, supra). To map the mutation in the hh 
l3E allele (base change C 1756 to A; coding change Tyj 57 to STOP), DNA isolated from 
/z/i I3E /TM3 was used to seed PCR reactions generating regions of the hh ORF and 

1 5 flanking sequences, which were subcloned into Bluescript KSM (Stratagene). Six clones 
each, derived from two different PCR amplifications were sequenced. 

As seen in lanes 1 and 2 of FIGURE 4 A, this construct generates a full length species of 
a mobility corresponding to the expected relative mass of 33 kD ? and two cleaved 
products whose apparent relative masses (25 and 9 kD) sum to give the relative mass of 

20 the larger species. The smaller of the cleaved products will occasionally migrate as two 
bands as seen in Fig 4A. We have chosen the lower of the two bands between the 
14.3-kD and 6.2-kD markers for our molecular weight measurement. The larger of the 
two cleaved products comigrates with the C species produced from the wild type protein, 
suggesting that the A89-254 hh protein contains the residues normally present in C and 

25 all of the determinants required for auto-proteolysis, including the normal cleavage site; 
most of the residues within N are dispensable for auto-proteolytic activity. 

In contrast, lesions affecting residues presumed to lie within C block auto-proteolysis in 
vitro. All mutations tested by in vitro translation were also examined in S2 cells by 
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immunoblotting. In all cases the patterns of cleavage in S2 cells were identical to those 
observed in translations except that C* was always present whenever C was formed. The 
former fragment was not observed in translations. These include the H329A mutation 
described above, a mutation that inserts an influenza virus epitope between residues 408 

5 and 409 (flu408), and three mutations that cause premature termination of the protein at 
the carboxy terminus. The two most severe truncations, 294 trunc and 410 trunc, are 
mutations generated in vitro. They cause a loss of 177 and 61 residues, respectively, 
from the carboxyl-terminus of the protein, and neither undergoes proteolysis. The 456 
trunc hh protein is like that encoded by the EMS-induced hh m mutant allele, which 

10 results in the loss of 15 residues from the carboxy-terminus of the protein. This protein 
undergoes auto-proteolysis, as demonstrated by the appearance of a 24 kD band in place 
of C, but the efficiency of the reaction is much impaired in vitro (FIGURE 4B). Auto- 
proteolysis of the hh protein relies mainly upon residues within C; deletion or alteration 
of residues within this domain is associated with reduced efficiency of processing, and 

15 one such deletion appears to be the cause of the hh uz mutation. 

The sequence homology and auto-proteolytic function of the full length hh protein 
suggested the possibility that F or the C fragment is a sequence-specific protease. As a 
first step in clarifying the mechanism of auto-proteolysis, an influenza virus epitope tag 
was introduced into the N-terminus of a hh open reading frame that also carried a H329A 

20 mutation. FIGURE 4C shows that the insertion of the epitope tag alone does not interfere 
with auto-proteolysis (lane 9), and yields a normal C fragment and an N fragment of 
increased relative mass (compare to wild type in lane 12). The protein carrying both 
mutations does not undergo proteolysis (lane 10), and since the epitope-tagged N 
fragment migrates differently from N, this double mutant provides an ideal substrate to 

25 look for intermolecular cleavage upon mixture with a wild type sequence. Lane 1 1 
shows that in such a mixture, although normal N is formed, no tagged N can be detected. 
Thus, in this experiment, no appreciable intermolecular cleavage occurs. We also failed 
to detect intermolecular cleavage in the following two experiments: (i) co-transfection 
of wild type and 410 trunc sequences into S2 cells (the cleaved 410 trunc protein would 
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yield a smaller and therefore identifiable form of C); (ii) mixing of excess unlabelled, 
purified His 6 -U protein with labelled, in vitro translated H329A mutant protein. Thus, 
although an intermolecular mechanism for regulation of auto-pro teo lysis or for cleavage 
of other proteins can not be ruled out, the current evidence suggests that cleavage of the 
5 hh protein occurs predominantly by an intramolecular mechanism. 

The hh gene has been broadly conserved in evolution, with single homologues 
unidentified in a wide variety of invertebrate species and multiple distinct homologues 
in each of several vertebrate species (Y. Echelard et al., Cell 75: 1417, 1993; S. Krauss, 
ef a/.,Ce//75: 1431, 1993; H. Roelink et al., Cell, supra). As seen in FIGURE 2, all of 

10 these coding sequences contain an invariant histidine and other conserved residues at a 
position corresponding to H329 in the Drosophila protein. In addition, the protein 
encoded by at least one of the mouse genes appears to be processed in vivo to yield two 
smaller species in a manner resembling the in vivo processing of the Drosophila protein. 
To determine whether auto-proteolysis may also play a role in vertebrates we examined 

1 5 the behavior of proteins encoded by two distinct hh homologues from the zebrafish, vwhh 
and shh. FIGURE 4D demonstrates that when these sequences are translated in vitro, 
smaller species are generated whose relative masses sum to yield approximately the 
relative mass of the full length protein (lanes 1 and 3). As seen in lanes 2 and 4, this 
cleavage reaction is blocked by substitution of Ala codons for the His codons at positions 

20 corresponding to H329 in Drosophila (see FIGURE 2). Vertebrate hh proteins thus 
appear to be processed by a similar mechanism as the Drosophila protein. 

EXAMPLE 4 
ROLE OF AUTO-PROTEOLYSIS IN EMBRYOS 

Numerous functions for the hh gene have been described in Drosophila. At the 
25 morphological level these include a role in patterning of larval cuticular structures and 
adult structures such as the eye and appendages (C. Niisslein-Volhard and E. Wieschaus, 
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Nature 287: 795, 1980; and J.Mohler, Genetics 120: 1061, 1988).; the mechanistic basis 
for thee morphological effects involves signaling for maintenance or induction of gene 
expression in embryos and imaginal discs (J. J. Lee, supra; T. Tabata and T. B. 
Romberg, Cell 76: 89, 1994; and K. Basler and G. Struhl, Nature 368: 208, 1994). To 

5 ascertain the importance of auto-proteolysis for these functions, the H329A mutant gene 
under control of the hsp 70 promoter was introduced by P element-mediated transforma- 
tion into the Drosophila germline. The hshh H329A construct was made identically to 
the hshh construct with the use of a hh ORF fragment containing the H329A mutation. 
Transgenic flies were generated from a y l w ms parental strain using standard methods of 

10 P element mediated transformation (A. C. Spradling and G. M. Rubin, Science 218: 34 1 
1982). A line, HA3, carrying the hshh H329A P element on the second chromosome was 
maintained as a homozygous stock. To assay for expansion of wg stripes, embryos 
collected at 4 to 6 hours after egg laying (AEL) at 25 ° C were subjected to the following 
heat shock protocols prior to fixation. Embryos receiving single shocks ( 1 0 or 30 minutes 

1 5 at 37° C) were allowed to recover for 1 hour at 25 ° C. Embryos receiving double shocks 
(two 10 minute or two 30 minute shocks at 37° C) were allowed to recover 90 minutes 
after the first shock and 40 minutes after the second (Both recoveries were at 25 0 C. The 
double 30 minute protocol was as previously described, (S. Krauss, supra). In situ 
hybridizations were performed.as described (D. Tautz, Chromosoma 98:81, 1 989) using 

20 a wg specific probe (D. T. Chang et aL, supra). Embryos assayed for cuticle phenotype 
were heat shocked 6 to 8 hours AEL for 30 minutes at 37° C, allowed to develop at 25° 
C for 36 hours and then processed and mounted as described (M. Ashburner, Drosophila: 
A Laboratory Manual Cold Spring Harbor Laboratory Press, New York, 1989). 
Immunolocalizations (single or double stains) were performed as described. With the use 

25 of affinity purified Abl or Ab2 for the primary antibody and alkaline phosphatase (AP) 
or horseradish peroxidase (HRP) conjugated anti rabbit or mouse IgG (Jackson Immuno- 
research) for the secondary. Embryos from a /i/i 13E /TM3 /fz-lacZ (the balancer 
chromosome was from the Bloomington Stock Center, strain 3218) stock homozygous 
for the hh UE allele were identified by the lack of staining with an anti b-galactosidase 

30 antibody (Promega) in a double stain with Ab2 (FIGURE 9, panel D). Staining in 
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FIGURE 9, panels B and C were performed formaldehyde fixed Canton-S embryos with 
the use of an AP conjugated anti-rabbit IgG secondary. Although standard formaldehyde 
fixation was generally used, heat and acid-formaldehyde fixation also gave similar 
results. GST fusion proteins containing either residues 83 to 160 or 300 to 391 from the 

5 Hh protein were expressed in E. coli, purified as recommended (F. M. Ausubel et aL 
supra), and used to immunize rabbits by standard methods. The antibodies were affinity 
purified on a column of His 6 -U protein (Harlow and Lane, supra) linked to Affi-Gel 10 
beads (Bio-Rad). The purification was performed as described (Harlow and Lane, supra) 
except that the acid and base elutions contained 10 percent dioxane. Biotinylated hh 

10 antibodies were prepared by purifying the rabbit antisera over a protein A column, 
followed by biotinyiation with the use of the Immunoprobe biotinylation kit (Sigma). 
Immunoprecipitations were performed as described (Harlow and Lane, supra) with the 
use of cold RIPA lysis buffer containing 0.25 mM PMSF and 5 mM EDTA for tissue 
homogenization. Lysates were precieared twice with pre-immune rabbit serum plus 

15 protein A beads (Gibco BRL). Affinity purified antibodies or pre-immune serum was 
then added, and the immunoprecipitation was performed with protein A beads, with the 
use of NP-40 lysis buffer for the washes. 

FIGURE 5 (A) and (B) are immunoblots developed with the use of Abl and Ab2 
antibodies respectively. Lanes 1 and 6, induced untransfected S2 cells; lanes 2 and 7, 

20 transfected S2 cells induced to express hh\ lanes 3 and 8, heat shocked wild-type 
embryos; lanes 4 and 9, heat shocked hshh embryos; lanes 5 and 10, heat shocked hshh 
H329A embryos. In heat shocked hshh embryos, the wild-type Hh protein is both 
induced and properly processed to generate the U, N C and C* species seen in other 
expression contexts. In contrast, the H329A is induced but not appreciably processed in 

25 hshh H329A embryos (the low levels of processed species in lanes 5 and 10 are probably 
from endogenous hh expression since they are seen at identical levels in heat shocked 
wild-type embryos in lanes 3 and 8). 
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FIGURE 5 shows that heat shock induction results in the formation of an abundant 
species that corresponds to U based on its mobility and its interaction with Abl and Ab2 
(lanes 5 and 10). In contrast, induction of wild type hh protein using a similar contruct 
resulted in similar levels of the N and C processed products (lanes 4 and 9), with very 
5 little uncleaved U. Thus, as observed in vitro and in S2 cells, the H329A mutation in 
embryos appears to greatly reduce the efficiency of auto-proteolytic cleavage of the hh 
protein. 

In FIGURE 6, the embryonic distribution of wingless (wg) RNA as revealed by in situ 
hybridization is shown in FIGURE 6 (A) wild-type (homozygous^ 1 (B) hshh, and 

10 (C) hs/2/1 H329A embryos that were exposed to two 10 minute heat shocks separated by 
a 90-minute recovery period. Wild-type embryos showed little change in wg expression, 
whereas the wild-type protein and, to a lesser extent, the H329A protein each induced 
ectopic wg expression (Table 1). Panels (D), (E), and (F) show the dorsal surfaces o(y ] 
vv U18 , hshh, and hshh H329A larvae, respectively, at the level of the fourth abdominal 

15 segment. These larvae were shocked for 30 minutes as embryos and allowed to complete 
embryogenesis. Cuticle cell types (1°, 2°, 3°, and 4°) are labeled as described (J. 
Heemskerk and S. DiNardo, Cell 76: 449, 1994). Note the expansion of 2° cell types 
(naked cuticle) at the expense of 3° and some 4° types in the hshh embryo (E) under 
conditions where the phenotype of hshh H329A embryos (F) is identical to that of control 

20 embryos (D). 

Perhaps the earliest known requirement for Hh protein is in maintenance of an adjacent 
stripe of wingless (wg) gene expression in each embryonic segment (A. Martinez Arias, 
etaL. Development 103: 157, 1988; andS. DiNardo, etaL Nature 332: 604, 1988). This 
requirement is deduced from the loss of wg expression when hh function is absent; in 
25 addition, the ubiquitous expression of wild-type Hh protein induces expansion of the 
domain of wg gene expression (P. W. Ingham, Nature 366: 560, 1993). The effects of the 
H329A mutation upon wg expansion were examined by heat shocking embryos carrying 
the H329A mutant construct in parallel with embryos containing the wild-type construct. 
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Although the H329A mutant protein is able to induce some expansion of the wg domain, 
the efficiency of this activity is impaired relative to that of the wild-type protein 
(FIGURE 6, B and C; Table 1). The difference in efficiency ranges nearly as high as 
threefold depending upon the heat shock regime, and these results suggest that a 
5 uto-proteolysis of the Hh protein is important for optimal activity in embryonic signaling 
to induce wg expression. 

TABLE 1 

Wild-type and mutant hh activity in embryonic induction of wz expression* 

minutes of heat shock 

10 10 30 10/10 30/30 

hshh 1 .0 ± 0.3 (93) 1 .5 ± 0.6 (1 20) 2.9 ± 0.3 (41 ) 2.8 ± 0.4 (54) 

hshh H329A 0.7 = 0.5 (190) 0.9 ± 0.4 (III) 1 .1 ± 0.4 (145) 1 .9 ± 0.5 (93) 

* Expansion of wg expression beyond wild-type controls is given as average number of 
15 cell diameters ± standard deviation with number of embryos scored in parentheses. 

The effects of Hh protein on the patterning of cuticular structures are most clearly visible 
on the dorsal surface of the larva, where four distinctive cell types can be identified in 
each parasegment. These cell types have been designated 1°, 2,° 3°, and 4°, from 
anterior to posterior, with hh transcription occurring in precursors of the 1 ° cells (J. 
20 Heemskerk and S. DiNardo, supra). Differentiation of the first three cell types was 
shown to be dependent upon hh gene function, and it has been proposed that the fates of 
these cells are determined by the concentration of Hh protein, with highest concentra- 
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tions producing the 1 ° fate, intermediate concentrations producing the 2° fate, and the 
lowest concentrations producing the 3° fate (J. Heemskerk and S. DiNardo, supra). This 
proposal was supported by observations that the most anterior cell types display the 
greatest sensitivity to a reduction of hh expression, and that all of the 3 ° and some of the 

5 4° bristles are replaced by naked cuticle characteristic of the more anterior 2° cell type 
when hh is expressed ubiquitously at high levels. We have reproduced suppression 3 0 
and some 4° fates by heat shock induction of embryos that carry our wild-type construct 
(FIGURE 6E), but find that the H329A mutant is unable to alter cell fates in the dorsal 
cuticle of the larva (FIGURE 6F). Auto-proteolysis, or perhaps some other function 

10 blocked by the H329A mutation, thus appears to be essential for the patterning influence 
of Hh protein upon the dorsal cuticle. 



EXAMPLE 5 
EFFECTS OF THE H329A MUTATION 
UPON SIGNALING IN IMAGINAL DISCS 

1 5 Studies of H329A mutant protein were extended to the function to the patterning of adult 
structures and signaling within imaginal discs. In the eye imaginal disc hh function is 
required for appropriate development of pattern (J. Mohler, Genetics 120: 1061, 1988; 
J. J. Lee, supra; and J. Mohler and K. Vani, supra) and more recently has been shown 
to control progression of a wave of differentiation via induction of decapentaplegic (dpp) 

20 gene expression in the morphogenetic furrow of the eye (U. Heberlein, et ai, Cell 75: 
913, 1993; and C. Ma, et aL, Cell 75: 927, 1993). In leg and wing discs, ectopic 
expression of hh has also been shown to yield pattern duplications and defects and is 
associated with induction of ectopic expression of other signaling molecules normally 
expressed in a zone along the anterior/posterior compartment boundary (T. Tabata and 

25 T. B. Komberg, Cell 76: 89, 1 994; and K. Basler and G. Struhl , Nature 368 : 208, 1 994). 
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For studies of signaling in imaginal discs, a thermal cycler was utilized to subject larvae 
carrying heat shock-inducible hh constructs to successive rounds of heat shock and 
recovery. The effects of temperature cycling upon expression of dpp and wg in imaginal 
discs was examined by monitoring C-galactosidase expression from a reporter gene 

5 carrying dpp promoter sequences or from an enhancer detector P element inserted in the 
wg gene. In FIGURE 7. X-gal staining was used to follow expression of wg FIGURE 
7 ( A-C) or dpp FIGURE 7 (D-L) in imaginal discs of late third-instar larvae that carry 
wg-IacZ or dpp-lacZ reporter genes. Leg (A-F), wing (G-I) and eye-antennal discs (J-L) 
from control larvae (A ? D, G, J), larvae carrying the hshh transgene (B, E, H. K) and 

1 0 larvae carrying the hshh H329A transgene (C, F, I, L) are displayed. In all panels anterior 
is to the left. Arrows highlight the following features: an ectopic patch of dpp expression 
in the anterior compartment of wing discs in hshh H329A larvae (I); and an ectopic band 
of dpp expression in eye portion of the eye-antennal disc anterior to the morphogenetic 
furrow (marked by the other band of dpp expression more posteriorly) in hshh larvae (K). 

15 Expansion into the anterior compartment of wg expression in leg discs, and dpp 
expression in leg and wing discs in hshh larvae is similar to that described for the ectopic 
expression of hh. Morphological changes in the anterior compartment of leg (B and E) 
and wing discs (H) were also as described (K. Basler and G. Struhl, supra). In contrast, 
discs from hshh H329A and control larvae showed very little change in wg and dpp 

20 expression, even under prolonged heat shock conditions and morphological changes were 
never observed. (M-O) The eye phenotypes of adult control (M), hshh (N) and hshh 
H329A (0) flies that were shocked during larval development in a manner similar .to that 
of the imaginal disc experiments above. Duplicated eye structures were observed in hshh 
flies, but never in hshh H329A flies. The arrow in (N) points to a thin strip of cuticle 

25 between the two eye structures. Other deformities were also seen in hshh flies (for 
example, compare the thorax in N to M) . 

Virgin female flies from the homozygous lines hshh (D. T. Chang et al., Development, 
1 994, in press), hshh H329A, and y x w l 1 18 were crossed to males from the homozygous 
BS3.0 line (bearing a P element dpp reporter construct on the 2nd chromosome, referred 
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to as dpp-lacZ) (R. K. Blackman, et ai, Development 111 : 657, 1991) or the line y; Sco/ 
CyO, enlacZl l::wg (bearing a wg reporter P element enhancer trap on a second 
chromosome balancer; called wg-lacZ) (J. A. Kassis, et al, Proc. Natl Acad. Sci. U.S.A. 
89: 1919, 1992). Progeny were grown at 25°C in aerated 0.5-ml microcentrifuge tubes 
5 containing yeast paste until the late second instar or early third instar stage of larval 
development. The larvae were then cycled continuously at 37° C for 30 minutes followed 
by 25 0 C for 90 minutes in a Perkin-Elmer thermal cycler until they reached the late third 
instar stage. They were subsequently dissected and stained with X-gal as described (M. 
Ashburner, supra) or allowed to grow to adulthood for phenotypic analysis. 

10 As shown in FIGURE 7 A, wg expression normally occurs in a ventral sector of the leg 
disc along the anterior/posterior compartment boundary while dpp is expressed in the 
dorsal portion of the disc along this boundary (FIGURE 7D). Although thermal cycling 
of larvae carrying the wild-type hh gene produced abnormal leg disc morphology and 
extensive ectopic expression of both target genes, as previously reported for ectopic hh 

15 expression (FIGURE 7B and E), the H329A construct produced little if any detectable 
difference in these patterns of expression (FIGURE 7, C and F). Ectopic hh expression 
in the wing disc also leads to morphological changes and expanded expression of dpp 
(compare FIGURE 7, G and H), but the H329A construct produced only an occasional 
small patch of anterior ectopic expression (FIGURE 71). 

20 Ubiquitous expression of wild-type hh also leads to ectopic expression of dpp in the 
eye-antennal disc (compare FIGURE 7, J and K). In the antennal portion of this disc the 
expansion of dpp expression resembles that observed in leg discs. In the eye portion of 
the disc dpp expression is observed at its normal location in the furrow; however, ectopic 
expression also occurs in the form of a second dorso-ventral band at a location somewhat 

25 anterior to the furrow, thus giving the appearance of an eye disc with two morphogenetic 
furrows (FIGURE 7K). Indeed, in adults derived from temperature-cycled larvae that 
carry the wild-type hh construct, an apparently duplicated eye structure such as that in 
FIGURE 7N can be observed, with two eye structures separated by a thin strip of cuticle 
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(arrow). The H329A mutant protein, in contrast, did not induce expansion of dpp 
expression in either portion of the eye-antennal disc (FIGURE 7L), and does not induce 
eye duplications or cuticle defects in the adult (FIGURE 70). 

The experiments described thus far comprise multiple series of larvae subjected to two 

5 days of thermal cycling followed by immediate dissection for analysis of imaginal 
structures or further incubation at constant temperature for analysis of adult structures. 
Although the H329A protein appeared to have little activity in these experiments, the 
small patch of ectopic dpp expression induced in the wing disc (FIGURE 71, arrow) 
suggested that some residual activity remained. This suggestion was borne out in a 

10 similar experiment involving three days of cycling prior to dissection: the H329A protein 
clearly displayed some c//?p-inducing activity in this experiment, presumably as a result 
of the higher amounts of protein that accumulated during the longer cycling period. The 
wing in particular, but also other imaginal discs, displayed low and variable amounts of 
ectopic dpp expression. This expression in all cases was far less extensive than that 

15 observed for the wild-type construct examined in parallel; furthermore, morphological 
deformations of the imaginal discs, although quite common with the wild-type protein, 
were extremely rare with the H329A protein. Although its potency is greatly reduced 
relative to wild-type, the H329A protein retained at least some activity in early 
embryonic and imaginal disc induction of wg and dpp expression; in contrast, even under 

20 heat shock conditions far more severe than those required for effects by the wild-type 
protein, the H329A mutant remained completely inert with respect to the re-specification 
of cell fates in the dorsal cuticle of the larva. 
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EXAMPLE 6 
DIFFERENTIAL RELEASE OF N AND C 
INTO CULTURED CELL SUPERNATANTS 

A puzzling feature of hh function is its apparent short-range action in settings such as 
5 embryonic and imaginal disc signaling to wg and dpp, and longer-range action in other 
settings, such as patterning of the dorsal larval cuticle. These observations and the 
existence of two major protein products in vivo prompted us to look for differences in 
the solubility or diffusibility of N and C expressed in S2 cultured cells. FIGURES 8 (A) 
and(B) are immunoblots of cell pellets (lane 1) or supernatants (lane 2) from transfected 
10 S2 cell cultures expressing Hh protein, developed with Abl (A) and Ab2 (B). Samples 
in each lane were from the same volume of resuspended total culture. Whereas N 
remained mostly associated with the cell pellet (compare lanes 1 and 2 in A), C was 
nearly quantitatively released into the supernatant (compare lanes land 2 in B). U 
displayed partitioning properties in between those of N and C (A and B). (8C) 
15 demonstrates the heparin binding activity of various Hh protein species generated by in 
vitro translations with microsomes. Samples were: total translation mix (lane 1); 
supernatant after incubation with heparin agarose or agarose (control) beads (lanes 2 and 

4) ; and material eluted from heparin agarose or agarose beads after washing (lanes 3 and 

5) . F, U, Nss and N fragments are depleted from reactions incubated with heparin agarose 
20 but not agarose beads (compare lanes 2 and 4 to I), and the same species subsequently 

can be eluted from the heparin agarose but not the agarose beads (compare lanes 3 and 
5 with lane 1). FIGURES 8, A and B indeed show that these proteins behave differently, 
with most of the N fragment remaining cell-associated and all, or nearly all, of C being 
released into the culture supernatant. 

25 One possible explanation for this differential behavior might be association of the N 
fragment with extracellular matrix proteins on the surfaces of the S2 cells. Accordingly, 
the relative affinity of these two proteins for heparin agarose was examined, since 
heparin binding is a common property of proteins that associate with the extracellular 
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matrix. Given the obvious difficulty in obtaining soluble N from cultured cells, in vitro 
translation in the presence of microsomes was used to generate soluble, labelled N and 
C. As shown in FIGURE 8C, N but not C is depleted from these translation extracts by 
treatment with heparin agarose beads, while treatment with unmodified agarose beads 

5 did not deplete either fragment. Furthermore, N but not C was retained upon the heparin 
agarose beads upon extensive washing with a solution that contains 0.1% Triton X-100 
and 150 mMNaCl; in contrast, neither fragment was retained by unmodified agarose. N, 
but not C, binds tightly to heparin, and this behavior suggests that the low concentration 
of N released into culture supematants may be the result of binding to the extracellular 

10 matrix. Another mechanism that might contribute to the differential release of N and C 
into culture supernatant would be the expression in S2 cells of a receptor for N but not 
for C. Our current data can not distinguish these possibilities. 

EXAMPLE 7 

DISTINCT EMBRYONIC LOCALIZATIONS OF N AND C 

15 The differential release of N and C into cultured cell supematants suggested the 
possibility that these fragments might also be differentially localized in embryos. 
Previously reported hh protein localizations utilized either antibodies specific for N 
epitopes or antibodies unable to distinguish between N and C. FIGURE 9 shows the 
differential localizations of N and C in embryos by in situ localization of the hh 

20 transcript. FIGURE 9 (A) is shown in comparison to the distribution of N and C epitopes 
detected with Abl and Ab2 in panels (9B) and (9C), respectively. Note that the 
distribution of N and C epitopes span approximately one-third and one-half of each 
segmental unit respectively, while the transcript is limited to approximately one-quarter 
of each unit. In (9D), the localization of C epitopes in embryos homozygous for the hh nE 

25 allele is detected with the use of Ab2. C epitopes in this mutant, which displays impaired 
auto-proteolytic activity are more restricted, and resemble the wild-type localization of 
N. Homozygous hh nE embryos were identified by loss of a marked balancer from a 
heterozygous parent stock. All embryos are at mid to late stage 9 (extended germ-band). 
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FIGURE 9B shows in accordance with these reports, Abl, which is specific for N 
epitopes, reveals a segmentally localized distribution that is slightly broader than that of 
the hh transcript at the same stage (FIGURE 9A). Also consistent with these reports, we 
observed that N epitopes at later stages accumulate in large punctate structures. Our 
5" analysis here concentrates on the earlier stage, when antibody staining is weaker but 
before formation of the invaginations and grooves that later crease the epidermis and 
thereby complicate the interpretation. Ab2 was also utilized to detect C-specific epitopes 
with a variety of fixation and staining procedures. Although detection of C epitopes 
above background is more difficult than for N, we consistently observed a segmentally 
10 modulated pattern, albeit with a broader distribution than N (FIGURE 9C). This 
localization is also distinctive in that C epitopes at early or late stages are not found in 
the punctate structures characteristic of N. 

The hh l2E mutation encodes a prematurely truncated protein that is missing 15 residues 
normally present at the COOH-terminus. Because this protein displays a much reduced 

15 efficiency in auto-proteolysis the distribution of C in this mutant background was 
examined. FIGURE 9D shows that C epitopes in a homozygous hh nE embryo (identified 
by absence of a marked balancer) are distributed in a much tighter segmental pattern than 
in wild-type. This localization resembles that of N, and we thus conclude that the broad 
distribution of C epitopes normally seen is altered in hh [2E by retention of the unc leaved 

20 precursor near the site of synthesis. 

EXAMPLE 8 

THE ROLE OF AUTO-PROTEOLYSIS IN BIOGENESIS 
OF ACTIVE HEDGEHOG PROTEIN 

In addition to signal cleavage, the hh protein undergoes auto-proteolysis at an internal 
25 site to generate the predominant protein species observed in vivo. All or most of the 
amino acid residues required for this auto-proteolysis function map to C, the 
carboxy-terminal product of this internal cleavage. In an effort to determine the 
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importance of auto-pro teo lysis for function, we introduced a single residue mutation 
(H329A) that blocks auto-proteolysis of the hh protein in vitro and demonstrated that 
both processing and function of this protein is impaired in vivo. Since similar levels of 
induced protein were detected from a strain carrying the wild-type construct or from 

5 several strains carrying independent insertions of the mutant construct (FIGURE 5), the 
impaired function of the H329A protein relative to wild-type is not the result of reduced 
levels of expression. Further evidence in support of a role for auto-proteolysis derives 
from the effect of the hh m mutation, which reduces but does not eliminate 
auto-proteolysis of the hh protein in vitro (FIGURE 4). Correspondingly, the hh UE 

10 mutation is associated with a phenotype of intermediate strength in vivo (J. Mohler, 
supra). 

Curiously, the H329A Hh protein appears to retain weak activity in embryonic signaling 
to induce ectopic wg expression and, to a lesser degree, can function in imaginal disc 
signaling for induction of ectopic wg and dpp expression. In contrast to its retention of 
1 5 at least some signaling functions in embryonic and imaginal tissues, the H329 protein is 
completely inert when assayed for the ability to reprogram cell fates in the dorsal cuticle 
of the larva. 

The assays in which the H329A protein is active or partially active involve short-range 
signaling that normally occurs across one or at most several cell diameters; in contrast, 

20 the H329A protein fails to exert any effect upon patterning of the dorsal cuticle, a 
long-range activity that normally operates across most of the segment. Previous 
proposals to account for long-range patterning activities have suggested that hh 
expression induces other signaling molecules which are then responsible for executing 
the patterning functions (the signal relay model; see FIGURE 10A). FIGURE 10 shows 

25 a signal relay versus dual function models for hh protein action. In FIGURE 10 (A), the 
long-range effects of hh signaling are achieved indirectly through short-range induction 
of a second signaling molecule (X). Based on its biochemical properties and its 
restricted tissue localization, N is presumed to represent the active short-range signal 
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while the role of C would be limited to supplying the catalytic machinery required for 
biogenesis of N. In (10B), the long- and short-range signaling functions of hh are 
supplied by the N and C proteins derived by internal auto-proteolysis of the U precursor. 
N is implicated in short-range signaling by retention near its cellular site of synthesis, 
5 while C is less restricted in its distribution and would execute long-range signaling 
functions. In both models, auto-proteolysis is required to generate fully active signaling 
proteins. See text for further discussion. 

These proposals seek to maintain a consistent mode of hedgehog action by rationalizing 
the apparent long-range activities of hh products as indirect consequences of short-range 
10 signaling. Based on the distribution observed, the active molecule in this model might 
be N and the role of C would then be limited to supplying the catalytic machinery 
required for biogenesis of N. 

Our evidence suggests an alternative model, the dual function model (FIGURE 10B), in 
which long- and short-range activities of the hh protein might be executed by N and C, 

15 the two predominant forms of the molecule observed in vivo. The nearly quantitative 
release of C fragment into the culture medium of ///^-expressing S2 cells and its broad, 
though segmentally modulated distribution within embryos suggests that C might execute 
or contribute to long-range signaling functions. The N fragment, on the other hand, 
predominantly remains associated with the expressing S2 cells and also binds to heparin, 

20 which suggests a possible association with the extracellular matrix. These properties and 
the segmentally restricted embryonic distribution of N are suggestive of a role in the 
execution of short-range hh signaling activities. Since the vertebrate Hh proteins we 
tested also appear to be auto-processed and also carry predicted heparin binding sites just 
carboxy-terminal to their signal sequences (H. Roelink et al., sapra), many aspects of the 

25 dual function model discussed here in the context of Drosophila development may also 
apply to hh protein function in vertebrate development. 
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Execution of short-range functions by N would be consistent with the observation that 
the H329A mutant protein has at least partial function in signaling for the induction of 
wg and dpp, since this mutation does not alter residues located in the amino-terminal 
portion of the protein that normally would give rise to N. The uncleaved H329A protein 

5 thus would carry all the residues that normally interact with a presumed receptor for N, 
although there might be some effect on the affinity of the interaction due to the presence 
of carboxy-terminal sequences, thus accounting for the decreased potency of the H329A 
protein. Alternatively, the partial function of H329A protein may derive from an 
extremely small fraction of protein that appears to be cleaved, a very faint band with 

10 identical mobility to C appears in in vitro translations with the H329A protein (FIGURE 
4, lane 3). Execution of long-range functions by C is also consistent with our 
observations because long-range signaling might require the release of the C fragment 
or otherwise require the H329 residue for some function other than for cleavage. 

When N is synthesized from a native construct (wild type hh), it remains primarily cell- 
15 associated (FIGURE 10C), however, N generated from a truncated construct in cultured 
cells predominantly enters the culture medium (FIGURE 10D) (For constructs, see 
Porter, et al, Nature, 374:363, 1995). These results further confirm that autoprocessing 
by fragment C may regulate the degree of N association with the cell surface and 
therefore its range of action. 

20 EXAMPLE 9 

ISOLATION OF HEDGEHOG HOMOLOGUES 

The mouse and human /i/i-like sequences were isolated by polymerase chain reaction 
(PCR) using primers degenerate for all possible coding combinations of the sequences 
underlined in FIGURE 1 of Chang, etal, (Development, 120: 1994). PCR amplifications 
25 contained from 100 ng to 2 \ig genomic DNA (depending upon the genome size of the 
species), 2 ^iM of each primer, 200 [iM dNTPs (Pharmacia), IX reaction buffer 
(Boehringer-Mannheim) and 2.5 units Taq polymerase (Boehringer-Mannheim) in 50 |il 
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reactions. Amplification was as follows: 94°C 5 min, addition of Taq polymerase at 
75°C, followed by 94°C 1 min, 52°C 1.5 min and72°C 1 min for 30 cycles and a final 
extension of 72 °C for 5 min. All PCR products were cloned into pBluescript 
(Stratagene) prior to sequence determination. 

5 Mouse clones obtained in this manner contained 144 bases of sequence between the 
primer ends and were labelled with [a' 32 P]dATP and used for high stringency screens of 
mouse cDNA libraries made from whole 8.5 dpc embryonic RNA and from 14.5 dpc 
embryonic brain in the XZAP vector (a gift from A. Lanahan). Several clones 
corresponding to Hhg-\ were isolated and the largest, 2629 bp in length (pDTC8.0), was 

10 chosen for sequence analysis using dideoxy chain termination (Sanger, et aL, 1977) and 
Sequenase v2.0 (US Biochemicals). Compressions were resolved by using 7-deaza- 
guanosine (US Biochemicals). Sequence analysis made use of the Geneworks 2.0 
(IntelliGenetics) and Mac Vector 3.5 (IBI) software packages. 

One of the three mouse clones, Hhg-l , when used as a probe, yielded a 2.0 kb clone from 
15 a 8.5 dpc mouse embryonic cDNA library and a 2.7 kb clone from a 14.5 dpc embryonic 
cDNA library. The 2.7 kb cDNA appears to represent a nearly full length mRNA 
because it corresponds to a 2.7 kb band detected by hybridization on a Northern blot. 
The largest methionine-initiated open reading frame within this cDNA encompasses 437 
codons, and is preceded by one in frame upstream stop codon. Sequence comparisons 
20 indicate that the protein encoded by Hhg- 1 is identical to the independently characterized 
mouse Shh (Echelard, et a/., Cell, 75:1417-1430, 1993) except for an arginine to lysine 
difference at residue 122. Hhg-l also corresponds closely to the rat vhh~\ gene (97% 
amino acid identity; Roelink, et aL, Cell, 76:761-775, 1994), the chicken Sonic hedgehog 
(81% identity; Riddle, et aL, Cell, 75:1401-1416, 1993) and Shh from the zebrafish (68% 
25 identity; Krauss, et aL, Cell, 75:1431-1444, 1993; Roelink, et aL, Cell, 76:761-775, 
1994). The PCR-generated fragments Hhg-2 and Hhg-3 appear to correspond to the 
Indian and Desert classes of mouse hedgehog genes, respectively (Echelard, et aL, Cell, 
75:1417-1430, 1993). 
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Alignment of the Hhg-\ open reading frame with the two Drosophila hh sequences 
showed that all three proteins contain hydrophobic amino acid sequences near their 
amino-termini; the hydrophobic stretches within the D. melanogaster protein (residues 
64 to 83) and within the mouse protein are known to act efficiently as signal sequences 
5 for cleavage (Lee, et al, Cell, 21:33-50, 1992). Both Drosophila signal sequences are 
unusual in their internal locations, while the hydrophobic stretch of the mouse gene 
occurs at the extreme amino-terminus, a more conventional location for cleaved signal 
sequences. Although portions of sequence N-terminal to the Drosophila signal 
sequences are conserved, suggesting a functional role, the mouse gene lacks this region. 

10 The overall level of amino acid identity between HhgA and hh carboxy-terminal to the 
signal sequences is 46%. A closer examination shows that the amino terminal portion, 
from residues 25 to 187, displays 69% identity, while remaining residues in the carboxy- 
terminal portion display a much lower 31% identity. Like hh, the HhgA coding 
sequence is divided into three exons, and the boundaries of these exons are at the same 

15 positions within coding sequence as those of the three Drosophila hh exons. Curiously, 
the boundary between coding sequences of the second and third exons occurs near the 
transition from high to low levels of overall sequence conservation. The coincidence of 
these two boundaries suggests a possible demarcation of functional domains within these 
proteins. This location within Hh%-\ coding sequence also coincides approximately with 

20 the site of a presumed proteolytic cleavage. 

EXAMPLE 10 
HUMAN CLONING OF ///* GENES 

Partial sequence for two human hh genes has been obtained by DNA sequencing of 
clones derived by PCR amplification from genomic DNA with M-specific degenerate 
25 primers as outlined in Chang, et al t (Development, 120:3339, 1994) and EXAMPLE 9 
(FIGURE 1 1 A and B). More extensive screening by the same approach, either with the 
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same primers or with other primers from the hh coding region or with the human hh 
fragments seen in FIGURES 1 1 A and B, is expected to yield at the least a third gene, and 
possibly more, since at least three genes are found in the mouse. These segments of 
human hh genes can be used to obtain full coding sequences for human proteins by the 
5 following cloning method commonly used by those of skill in the art and which are 
extensively described in the literature. 

For example, ready-made cDNA libraries or RNAs from a variety of human sources, 
including various fetal stages and organs (from abortuses) and specific infant or adult 
organs (from pathological or autopsy specimens), are being tested for the presence of hh 

10 sequences by PCR or RT-PCR using the primers described in Chang, et ai, supra, and 
other primers derived directly from the sequence of the human fragments. Ready-made 
libraries containing hh sequences are being screened directly and, where necessary, new 
libraries are being constructed by standard methods from RNA sources containing hh 
sequences. The probe for these screens is a mixture of all the distinct human hh 

15 fragments. Sequences of cDNA clones can then be determined. Most clones containing 
the probe sequences, which are located in the N region, will also include a full C coding 
region since standard methods of library construction result in cDNA clones that are most 
complete at their 3' ends. All full length £/z-coding sequences obtained previously in 
vertebrates and invertebrates contain N and C sequences encoded in a single RNA. 

20 Screening is continued until complete open reading frames that correspond to all of the 
fragments of human hh genes are obtained. Specifically, 1.2 x 10 6 clones from a human 
fetal brain library (Stratagene, La Jolla, CA) was screened using a mixture of the two 
human hh fragments (FIGURE 11A and B) as probes. Twenty-nine clones were 
identified as specifically hybridizing with these probes. 

25 Second, the RNA sources identified as containing hh sequences can be used as templates 
from anchored PCR (also referred to in the literature as RACE, for rapid amplification 
of cDNA ends). Briefly, this method provides a means to isolate further mRNA 
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sequence in either the 5* or 3* direction provided that sequence is known from an internal 
starting point. Anchored PCR can also be used to isolate sequences from cDN A library. 

Third, genomic libraries can be screened with the probes described in the first technique. 
Where necessary, human hh exons and coding sequences are being identified by 
5 hybridization to previously isolated human and mouse coding sequences by sequence 
determination, and by exon-trapping methods to identify all hh coding sequences within 
genomic clones; these coding sequences can be "stitched" together by standard 
recombinant DNA methods to generate complete hh open reading frames. 

FIGURE 12 A and B show in vitro cleavage reactions of a Drosophila hh protein 

10 produced in E. coli and purified to homogeneity. This protein has residues 89-254 
deleted, rendering it more soluble and easier to purify. It also contains a His 6 purification 
tag appended to the N-terminus. Autoproteolysis of this protein is triggered by the 
addition of reducing agents (DTT), and the resulting product corresponds to the C 
fragment identified in vivo. FIGURE 12, Panel A shows a time course of cleavage after 

1 5 initiation by addition of DTT. Panel B shows incubations of concentrations ranging over 
three order of magnitude for a fixed time period (four hours), with no difference in the 
extent of conversion to the cleaved form. This concentration-independent rate of 
cleavage indicates an intramolecular mechanism of cleavage. Panel C shows the 
sequence around the cleavage site as determined by amino-terminal sequence of the 

20 cleaved fragment C. The cleavage site is denoted by the arrow, and the actual residues 
sequenced by Edman degradation of the C fragment are underlined. Panel C also shows 
an alignment of all published vertebrate hh sequences plus some of unpublished 
sequences from fish and Xenopus. The sequences shown correspond to the region of 
Drosophila hh where the cleavage occurs, and demonstrates the absolute conservation 

25 of the Gly-Cys-Phe sequence at the site of cleavage. Panel D shows a SDS-PAGE gel 
loaded with in vitro transcription/translation reactions as described in the previous 
Examples, using various hh genes as templates, dhh is Drosophila, twhh and zfshh are 
the twiggy-winkle and sonic hh genes of the zebrafish, and mshh is the shh/Hgh-l/vhh-1 



WO 98/30576 



PCT/US97/15753 



- 91 - 

gene of the mouse. The translation mix included 35 S-labelled cysteine, used to visualize 
the resulting products by autoradiography. Note that each gene give a larger product (the 
precursor or U) and two smaller products of cleavage (N and C). The larger species is 
C for each of the vertebrate genes, whereas the Drosophila N is larger than C due to the 

5 presence of -60 residues occurring amino-terminal to the signal sequence that are present 
in the vertebrate open reading frame. This panel shows that vertebrate hh proteins are 
processed similarly to the Drosophila protein. Panel E shows that Edman degradation 
of the C fragments releases 35 S counts on the first but not subsequent rounds for all these 
proteins, indicating that the site of autoproteolytic cleavage for all of these hh proteins 

10 is the amide bond to the amino-terminal side of the Cys residue that forms the center of 
the conserved Gly-Cys-Phe sequence highlighted in panel C This is a generalizable 
approach to establish the composition of protein fragments from any other hh family 
members. 

EXAMPLE 11 

15 DIFFERENTIAL EXPRESSION OF TWO hh GENES IN AXIAL 

MESODERM AND IN NEURAL PROGENITORS. 

Partial sequences corresponding to five distinct zebrafish /j/z-like genes were isolated and 
the complete coding sequences for two of these genes were obtained from an embryonic 
cDNA library. One of these two sequences is identical to that of the zebrafish nhh4 gene 

20 (Roelink, et ai t Cell, 76:76 1 , 1994), and appears to correspond to the shh gene reported 
by Krauss, et al, (Cell, 75:1431, 1993) (See FIGURE 13 description); the other gene, 
tiggy-winkle (Potter, B., The Tale of Mrs. Tiggy-Winkle, The Penguin Group, London, 
1905), represents a novel vertebrate hh. Coding sequences for both are shown in 
alignment to mouse and chicken sequences of the sonic/vhh-1 class (FIGURE 13b). Like 

25 other vertebrate hh homologues, the twhh and shh proteins contain an amino-terminal 
stretch of hydrophobic residues. These residues function as signal sequences since 
cleavage is observed when coding sequences are translated in the presence of micro- 
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somoses; vertebrate hh genes thus appear to encode secreted proteins, as previously 
reported for Drosophila hh (Kimmel CB. & Warga, R.M., Developmental Biology, 
124:269-280, 1987; Warge, R.M., & Kimmel, C.B., Development, 108:569-580, 1990). 

The first four sequences were isolated from zebrafish genomic DNA (a gift from J. 
5 Pellegrino) using degenerate primers in polymerase chain reactions as described (Chang, 
et aL, supra), twhh and shh clones were isolated from a 20-28 hour cDNA library (a gift 
from R. Riggleman, K. Helde, D. Grunwald and J. Pellegrino) using the first three 
sequences as probes. The translational reading frames for twhh and shh were closed 12 
and 16 codons, respectively, upstream of the putative initiating methionine. 

10 Figure 13 shows the predicted amino acid sequences are shown in single letter code. 
13(a) shows sequences common to five distinct AA-like genes are shown with a dot 
indicating identity with the corresponding residue of zebrafish twiggy-winkle (twhh, 
Potter 1905; supra), hh[zfB] and hh[zfC] is more diverged and appears to represent a 
novel class. 13(b) shows amino acid sequences of twhh and shh are aligned to those of 

15 the soniclvhh-1 class from chick and mouse (Riddle, et aL, Cell, 75:1401-1416, 1993; 
Chang, D.T., etaL, Development, supra; Echelard, Y., etaL Cell, 75:1431-1444, 1993). 
Zebrafish sonic hedgehog (shh) is identical in sequence to z-vhh-l reported by Roelink, 
et aL, Cell, 76761-775, 1994. Based on expression and extensive sequence identity 
throughout most of the coding region, vhh-1 and the sonic sequence reported here 

20 probably correspond to shh of Krauss, et aL, Cell, 75:1431-1444, 1993, diverges 
dramatically throughout a 26 residue stretch near the carboxy-terminus. Rat vhh-l/sonic 
hh (Roelink, et aL, supra .) was excluded in this alignment because of its 97% sequence 
identity to the predicted mouse protein. Residues identical in all four sequences are 
boxed, and a dash indicates a gap in the alignment. The arrow indicates the predicted 

25 signal sequence cleavage site (von Heijine, G. , Nucleic Acids Res., 14, 4683-4690, 1986) 
for twhh. The amino-terminal hydrophobic stretch common to all four hh genes is 
shaded. 13(c) shows percent identity of residues carboxy-terminal to the hydrophobic 
region. 
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Figure 14 shows a comparative expression of twhh, shh, and pax-2 during zebrafish 
embryogenesis. Whole mount in situ hybridizations on 0-36 hour embryos were 
performed using a modification of the procedure of Tautz and Pfeifle, Chronosoma, 
98:81-85, 1989, with antisense probes. Transcript localization is revealed by the purple 

5 product of an alkaline phosphatase enzymatic reaction. Staging of the embryos is 
according to Westerfield, M., (The Zebrafish Book, University of Oregon Press, Eugene, 
1993). Transcripts were visualized by in situ hybridization to whole embryos, (a, b) 
twhh expression in a single late shield stage embryo, (a) Dorsal view, animal pole is to 
the top. The triangular shape of expression is characteristic of axial mesoderm-forming 

10 cells of the hypoblast (Statchel, S.E., et aL, Development, H7: 126 1-1 274, 1993). (b) 
Lateral view: the thicker layer of cells on the left (dorsal) side of the embryo is the 
embryonic shield; the two arrows indicate the Av/z/z-expressing hypoblast cells and the 
non-expressing epiblast. Anterior is to the left in all subsequent embryos. Dorsal is to 
the top in all lateral views, (c, d) A single embryo at the end of gastrulation (100% 

15 epiboly) with fw/z/z-expressing cells, (d) Caudal-dorsal view. Note the wide patch of 
stain in the presumptive tailbud which narrows anteriorly, (e, j) Early somitogenesis 
(1 1.5 hour, 3-4 somite) embryos; optic vesicles have not begun to evaginate from the 
wall of the diencephalon. (e, h, k) Lateral views of developing brain, (f, i, 1) Dorsal 
views of developing brain, (e, f } g) Localization of /Wz/z-expressing in a single row of 

20 cells that will form the flood plate. The arrowhead marks a parch of mM-expressing 
cells lateral to the tailbud. (h, i, j) Localization of shh. shh is also expressed strongly 
in the protuberance, (j) Lateral view of the developing tail, shh is also expressed 
strongly in the protuberance, (j) Lateral view of developing tail, shh is expressed in cells 
that will form both floor plate and notochord. (k, 1, m) Localization of pax-2 during early 

25 optic vesicle formation; (m) also shows twhh expression, (k) 12 hour (4-5 somites) 
embryo. (1) 12.5 hour (5-6 somites) embryo. Expression of pax-2 in the developing 
optic vesicle is in a gradient away from the protuberance. Note the expression of pax-2 
(asterisk) at the future midbrain-hindbrain border, (m) twhh (arrow) and pax-2 
expression in a 6-7 somite (13 hour) stage embryo. Note differential expression of twhh 
30 in ventral neural keel (corresponding to neural tube in other vertebrates), (n-s) Embryos 
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at end of somitogenesis (22-24 hours), (n, o, p) Localization of twhh. (n, o) Developing 
brain. Note isolated groups of cells staining in the diencephalon (filled triangles) and the 
protuberance (arrowhead), and floor plate expression underlying the midbrain and 
hindbrain. The floor plate expression is contiguous caudally along the axis, (n) Lateral 

5 view, (o) Dorsal view, (p) Lateral view of tail. Expression is restricted to the floor 
plate. (q, r, s) Localization of shh. (q, r) Developing brain (q) Lateral view, pax-2 
expression in the otic vesicle is indicated, (r) Dorsal view. Expression in the 
protuberance (arrowhead) and in the neural keel, (s) Lateral view of tail. Expression is 
strongest in the floor plate, but contrary to the report of Krauss, et al, supra ., is still also 

10 in the notochord. Abbreviations: white e - epiblast; h - hypoblast; tb - tailbud; p - 
protuberance; c - eye; ov - optic vesicle; ot - otic vesicle; fp - floor plate; nc - notochord; 
asterisk - midbrain-hindbrain boundary or /?ax-2-labeled prospective midbrain-htndbrain 
boundary; t - telencephalon. 

Comparison of twhh and shh expression patterns (Krauss, et al % supra), reveals that both 
15 gene are predominantly expressed in midline structures, albeit with notable differences 
in regard to timing, rostra-caudal extent, and tissue restriction. Expression of twhh is 
first detected during gastrulation in the dorsal mesoderm (FIGURE 14a, b); this 
expression occurs in a band corresponding to a subset of the embryonic shield, a 
structure, analogous to Spemann's organizer in Xenopus (Stachel, et aL Dev., 117:1261- 
20 1274, 1993, and reference therein; Ho., R., Seminars in Developmental Biology, pg.3, 
1992). In concert with the movements of convergence and extension, this band of twhh 
expression shortens along the equatorial plane and extends along the incipient embryonic 
axis until, by the end of gastrulation, expression occurs throughout the entire axis 
(FIGURE 14c,d). Early in somitogenesis, twhh RNA is found restricted to presumptive 
25 ventral neural tissue along the entire body (FIGURE 14e, f, g), the only exception being 
cells in and near the tailbud (FIGURE 14g). In contrast to the neural restriction of twhh, 
shh is localized both to presumptive neural and notochordal cells (FIGURE 14j). 
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As somitogenesis proceeds, ventral midline expression of shh and twhh is reduced in 
most of the prospective forebrain, but remains strong in an anterior patch of midline cells 
within the floor of the prospective diencephalon (FIGURES 14e, f, for twhh; FIGURES 
h, i for shh), This patch later will give rise to the protuberance (Schmitt, E.A. and 

5 Dowling, J.D., J. Comp. New., 344:532-542, 1994), an anterior extension of the 
diencephalon. This structure, which is medial and just rostral to the developing optic 
stalks, is the site we propose as the focus of early patterning activity for the developing 
eyes (see below). By the end of somitogenesis, both twhh and shh are strongly expressed 
in the floor plate (FIGURES 14p, s), although shh transcripts remain detectable in the 

10 notochord at this stage and at 36 hours of development (FIGURES 14s; later stage not 
shown). At 28 hours, twhh transcripts are also found in a small cluster of cells within the 
first gill arch (not shown), as also reported for shh at 33 hours of development (Krauss, 
et al. t sapra). 

Differences between twhh and shh expression are apparent from the beginning of 
15 gastrulation, since twhh RNA can be detected as early as the shield stage while shh is 
first detected later, -at about 60% epiboly (not shown; (Krauss, et al, supra). In addition, 
twhh transcripts are restricted to neural tissues early in development, and are never 
detected in the notochord (compare FIGURE 14g to FIGURE 14j). Later differences in 
expression include differential rostra-caudal restriction within the diencephalon and 
20 midbrain and weaker and more restricted expression of twhh in the protuberance 
(compare FIGURES 14n and 14q), such that the later domain of twhh expression in the 
brain appears to constitute a subset of the shh domain. In addition, shh but not twhh is 
expressed in the developing fin bud (Krauss, et al. t supra). Comparison of shh and twhh 
expression patterns to this previously reported for hh homologues in zebrafish and other 
25 vertebrate species indicates that shh is the zebrafish homologue of the sonic/vhh-I class 
while twhh represents a novel class of vertebrate hh. 
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EXAMPLE 12 

DEVELOPMENTAL CONSEQUENCES OF ECTOPIC hh 
EXPRESSION DURING ZEBRAFISH EMBRYOGENESIS 

To gain insight into the potential roles of hh products in development, synthetic twhh and 
5 shh mRNA was injected into 1 -8 cell embryos. This technique yields a mosaic but fairly 
uniform pattern of expression, as determined for the control mRNA encoding P- 
galactosidase (not shown). Uniformity of expression is in good agreement with fate 
mapping studies of the early zebrafish embryo (Kimmel & Warga, supra; Warga & 
Kirnmel, supra; Heide, et al, Science, 265:517-520, 1994), which indicate that 
10 blastomeres undergo extensive cell mixing during the cleavages prior to gastrulation. We 
note that mosaicism of expression caused surprisingly little variation in the phenotypes 
of the hh injected embryos, possibly due to secretion of hh gene products. 

Embryos injected with synthetic twhh or shh mRNA (hh RNA) exhibited numerous yet 
highly reproducible abnormalities in comparison to control embryos injected with lacZ 
15 mRNA. These abnormalities, discussed below, are primarily defects in the brain and 
eyes. Although the effects of ectopic twhh and shh expression were qualitatively similar, 
the incidence and severity were greater with twhh RNA (see text below, FIGURE 15 and 
FIGURE 16). The proteins encoded by these two genes have qualitatively similar 
biological activities, but apparent differences in potency. 

20 FIGURE 15 shows the effects of ectopic hh on zebrafish development. Wild type 
zebrafish, Danio rerio, Ekkwill Waterlife Resources) were maintained at 28.5°C, some 
embryos were then cultured overnight at RT. Zebrafish embryos were injected at the 1 -8 
cell stage with twhh shh, or ZacZRNA and examined at 28 h of development, (a-c) 
Dorsal view of the midbrain-hindbrain region; anterior is left, (a) lacZ. (b) twhh. (c) shh. 

25 (d-f) Frontal optical section of the forebrain region; anterior is up. (d) lacZ, (h) twhh. 
(f) shh. (g-I) Lateral view of the eye region; anterior is left, (g) lacZ. (h) twhh. (i) twhh. 
At levels caudal to the prospective brain, the notochord, somites, and neural keel formed 
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by most M-injected embryos appeared grossly normal except for an overall shortening 
and dorsal curvature of the axis. A minority of M-injected embryos (15% are not shown) 
displayed partially bifurcated axes, containing duplicated axial mesoderm and parallel 
neural keels, each neural keel comprising ventral midline cells and some bilaterally 

5 symmetric lateral cells (not shown). Although we have not determined the primary cause 
of these axial defects, analysis of late gastrulation stage embryos suggests hat the 
bifurcation may result from difficulties in epiboly and convergence. Abbreviations: mv 
- mesencaphalic ventricle; rv - rhombencephalic rentricle; asterisk - midbrain-hindbrain 
boundary; ot - otic vesicle; tv - third (diencephalic) ventricle; r - retina or retina-like 

10 structure; I - lens or lens-like structure; pe - pigmented retinal epithelium. 

Morphological defects in the brain and other rostral neural derivatives occur at high 
frequency in M-injected embryos. The three ventricles of the fish brain normally 
apparent at 28 hours of development - the rhombencephalic, mesencephalic (FIGURE 
15a), and diencephalic (third ventricle; FIGURE 15d) - are not formed in the brains of 

15 hh injectees (FIGURES 15b, c; FIGURES 15e, f)), despite the obvious presence of a 
lumen. The prominent construction normally present at the midbrain-hindbrain boundary 
also is absent (compare FIGURE 15a to FIGURES 15b, c). Formation of this 
constriction requires function of pax-2 (Krauss, et aL, Nature, 353:267-270, 1991; 
Krauss, et aL, Nature, 360:87-89, 1992), which normally is expressed in a band at the 

20 midbrain-hindbrain boundary (Krauss, et aL, supra; Krauss, et aL, Development, 
U3:l 193-1206, 1991) pax2 expression at this boundary is not disrupted by hh RNA 
injection, however, indicating that this phenotype does not result from disruption of 
rostra-caudal information. 

Defects in eye development also occur at high frequency in embryos injected with hh 
25 RNA. Thus, while at 28 hours the normal zebrafish eye has a lens and a retina with 
pigmented epithelium (FIGURE 15d, g), M-injected embryos usually fail to develop 
lenses and retinal pigmentation (FIGURE 15e, h). Eye duplications are also observed 
at low frequencies (FIGURE 15i). The poorly developed eyes do not appear to result 
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from a simple delay in development since pigmentation elsewhere in injected embryos 
appears in its normal time course. Examined at three days of development, the 
consequences of hh RNA injection include defects that range from complete absence of 
eyes to partially formed eyes lacing a ventral portion of the retina. 

5 The eye phenotypes caused by hh RNA injection resemble those produced by treatment 
of zebrafish and Xenopus laevis embryos with retinoic acid. In Xenopus, phenotypes 
range from reduction of the eye and absence of the lends to eyes with retinal folds 
(resembling duplicated dyes) and multiple small lenses (Manns, M. & Fritzsch, B., 
Neuroscl Lett, 127:150-154, 1991). In zebrafish, exposure to retinoic acid during 
10 gastrulation interferes with the formation of the eye (Holder, N. & Hill, J., Development, 
113 : 1 1 59-1 170, 1991), while exposure during formation of the optic primordia induces 
formation of duplicated retinas and extra lenses (Hyatt, et aL, Proc. Natl Acad, Set USA, 
89:8293-8297, 1992). Patterning effects of retinoic acid upon the developing chick limb 
appear to be mediated through ectopic activation of the endogenous sonic hh gene 
1 5 (Riddle, et aL, supra), these results with ectopic hh expression suggest the possibility of 
a similar mechanism underlying the patterning effects of retinoid acid treatment in the 
vertebrate eye. 

EXAMPLE 13 

hh EXPRESSION IN THE OPTIC VESICLE SPECIFIES 
20 PROXIMAL FATES AT THE EXPENSE OF DISTAL FATES 

To further elucidate the role of hh in eye development we utilized pax-2 and pax-6 
(Krauss, et aL, EMBOl, 10:3609-3619, 1991; Pitischel, etaL, Development, H4:643- 
651, 1992) were utilized as positional markers to examine the effects of ectopic hh 
expression on the optic vesicle. As the optic vesicle evaginates from the lateral walls of 
25 the zebrafish forebrain (Schmitt, E.A. & Dowiing, J.D., J. Comp. Neur., 344:532-542, 
1994), pax-2 is expressed in a gradient, with highest RNA levels in the anterior and 
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ventral regions of the optic vesicle (Krauss, et ai, supra; FIGURE 14k, 1, m). 
Immediately adjacent to the maximum of this pax-2 expression gradient is the region of 
the dicnecphalon termed the protuberance (Schmitt & Dowling, supra), where both twhh 
ondshh but not pax-2 are strongly expressed (FIGURES 14e, f, h, i, m). The concentra- 
5 tion gradient of pax-2 expression in the eptic vesicle thus appears to incline downward 
from its maximum at a location adjacent to the site of twhh and shh expression in the 
protuberance. Superposition of developmental fate within the optic vesicle (Schmitt, et 
ai, supra), upon the pattern of pax-2 expression suggests that the gradient of pax-2 RNA 
prefigures the future proximal/distal axis of the eye. 

10 Ectopic hh alters the expression of pax-2, pax-6, and F-spondin. Zebrafish embryos were 
injected at the 1-8 cell stage with twhh or shh RNA and the pattern of pax-2, pax-6, or 
F-spondin expression was examined by whole mount in situ hybridization. Control 
embryos injected with lacZ RNA were performed in every case and displayed wild-type 
expression patterns. At embryo stage, the anterior-posterior axis of the optic vesicle 

15 corresponds to the future proximal-distal *axis of the eye. During the next hour of 
development, the posterior edge of the optic vesicle will separate from the diencephalon 
(Schmitt and Dowling, Comp. Neur., 344:532-542, 1994). 

Injection of either hh RNA- causes uniform initiation of pax-2 expression along both the 
proximal-distal and dorsal-ventral axes of the optic vesicle as it begins to evaginate. The 
20 ectopic pax-2 expression appears at the same time as normal pax-2 expression is initiated 
in the eye, and in some cases, is also seen in the diencephalon between the optic vesicles. 
At the end of somitogenesis, a time when pax-2 would normally be restricted to the optic 
stalk ? pax-2^RNA in /t/i injectedembryos is detected in all but the most distal portion of 
the optic vesicle. 

25 The effects of ectopic hh on expression of pax-6, which encodes a transcription factor 
critical for eye development was also studied. At 22 hours of zebrafish development, 
pax-6 is normally expressed in the lens and in most of the distal part of the optic cup 
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(Krauss, et aL, supra; PuscheL et aL, Development, 114:643-651, 1992). In M-injeeted 
embryos, pax-6 is repressed in the optic vesicle, although many embryos retain pax-6 
expression in the most distal cells. With regard to pax-2 and pax-6 as markers of 
positional identity, hh expression in the optic vesicle can be characterized as inducing 
5 proximal fates and repressing distal fates. 

The distal part of the optic vesicle is the most refractory to M-induced changes in both 
pax-2 and pax-6 gene expression. Due to a later rotation, this distal portion of the optic 
vesicle will give rise to the dorsal portion of the mature eye (Schmitt, et aL, supra); 
interestingly, this is the portion of the eye that remains in 3-day old injected embryos 
1 0 with intermediate phenorypes (see above). 

Lesions in the pax-6 gene have been assigned as the basis for the Aniridia (Ton, et aL, 
Cell 67:1059-1074, 1991; Glaser, et aL, Nat. Genetics, 2:232-239, 1992), Small eye 
(HilL et aL, Nature, 354:522-525, 1992), and eyeless mutations (Quiring, et aL f Science 
265:785-789, 1994), in humans, mice and Drosophila, respectively; pax-6 function thus 

1 5 appears to be critically required for eye development in Drosophila and mammals. As 
we argue here, /z/z-encoded activities also appear to play a role in vertebrate eye 
development, and this suggests a further molecular parallel between vertebrates and 
insects, since the role of hh in Drosophila eye development is well established (Mohler, 
et aL, supra; Ma, et aL, supra; Heberlein, et aL, supra; Lee, et aL, supra). The reciprocal 

20 and non-overlapping patterns of hh and pax-6 expression in the developing Drosophila 
eye (Ma, et aL, supra; Quiring, et aL, Science, 265:785-789, 1994), suggest the 
possibility of pax-6 repression by hh y but whether hh functions by similar mechanisms 
in vertebrate and Drosophila eye development is a questions that requires farther 
investigation. 

25 In mice, the dosage of pax-6 protein is crucial for normal eye development (Hill, et aL, 
supra). Small eye heterozygotes develop an abnormally small lens (Hogan, et aL, J. 
EmbryoL Exp. Morpk, 97:95-1 10, 1986; Hogan, et aL, Development, 103 SuppL, 115- 
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119, 1988), as do M-injected embryos with weaker phenotypes (FIGURE 14f), Small 
eye homozygotes lacking lenses eventually generate and the animals lack eyes at birth 
(Hogan, et aL, supra, Hogan, et aL supra), as do many of the M-injected embryos at 
three days of development. These parallels suggest that many of the later eye defects 
5 observed in /z/z-injected zebrafish may be caused by partial or complete repression of 
pax-6 during eye development. 

EXAMPLE 14 

GENETIC ABLATION OF hh FOREBRAIN EXPRESSION CAUSES LOSS 
OF PROXIMAL FATES IN THE OPTIC VESICLE 

10 The patterns of twhh and shh expression (FIGURE 14) and the effects of ectopic hh 
expression (FIGURE 15) are consistent with a normal role for shh and twhh in eye 
development. If hh activities indeed play a normal role in promoting proximal fates 
within the developing eye, removal of hh activities would be expected to result in a loss 
of proximal fates. In embryos homozygous for the cyclops mutation ventral neural 

15 structures fail to form and the developing eyes fuse at the midline,, yielding an embryo 
with a single eye (Hatta, et aL, Nature, 350:339-341, 1991). The missing ventral 
structures in cyclops mutants include the regions where we observe expression of twhh 
and shh, and we therefore examined the effects of the cyclops mutation on hh expression. 

cyc b16 (Hatta, et al t Nature, 350:339-341, 1991), heterozygous adults (a kind gift of R. 

20 Riggleman) were spawned and their offspring analyzed by whole mount in situ 
hybridization. Detection of pax-2 and either twhh or shh RN As in embryos homozygous 
for the eye mutation or their wild-type siblings, twhh RNA is only expressed in the 
presumptive tailbud (caret) of eye embryos. As reported by Krauss, et al f Cell supra, 
neural expression of shh is abolished in eye embryos. Strong pax-2 expression was 

25 observed in the optic vesicles of wild-type embryos which is significantly reduced in eye 
mutant embryos. 
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t\vhh RNA in cyclops embryos is found only in a small patch of cells at the presumptive 
tailbud and neural expression was not detected at any later stage examined. Neural 
expression of shh is also lost in eye mutants, although expression in the notochord is 
reunited (Krauss, et aL, supra: data not shown). 

5 Since the eye mutation appears to ablate Wz-expressing cells in the developing brain, this 
mutation can be used as a genetic tool to examine the requirement for hh function in eye 
development. Iiatta, et aL; Hatta, et aL, Proc. Natl. Acad ScL USA, 91:2061-2065, 
1994), recently demonstrated that pax-6 expression is fused at the midline due to loss of 
ventral midline cells that normally do not express pax-6 and, in addition, pax-2 

10 expression in the fused eye of eye mutant embryos is reduced. We extended these 
observations to an earlier stage when the optic vesicles first form and found that pax-2 
expression is weak and fails to extend within the vesicles in eye mutants. In conjunction 
with the results of ectopic hh expression, these observations suggest that hh signaling that 
activity promotes and is required for the induction of proxima fates within the eye 

1 5 vesicle. In this model, we propose that the protuberance acts as a proximal patterning 
center for the developing zebrafish eye by providing a localized source of hh activity. 

EXAMPLE 15 

hh ACTIVITY VENTRALIZES THE DEVELOPING BRAIN 

Previous work has established an important role of signals from the floor plate and 
20 notochord in ventral patterning of the neural tube (Jessell, T.M., & Dodd, J. ? Cell, 69:95- 
1 10, 1992). For example, Goulding, et aL, Development, 117:1001-1016, 1993, recently 
demonstrated that notochord and floor plate grafts can repress the normal lateral 
expression of pax-6 in the "neural tube. Other recent work has implicated hh activity in 
at least some aspects of ventral neural tube patterning (Echelard, et aL, CelL 75:1417- 
25 1430, 1993; Krauss, et aL, supra; Roelink, et aL, supra); consequently, we examined hh- 
injected embryos for effects on pax-6 expression in the brain. 
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In the zebrafish at 22 hours of development, pax-6 is expressed in dorso-lateral regions 
of the diencephalon and in a ventrolateral domain of the hindbrain and spinal cord that 
excludes the floor plate and adjacent cells (Krauss, et ai, supra; Puschel, et ai, supra). 
This pattern of expression is reciprocal to that of both twhh and shh in the diencephalon 

5 (compare FIGURES 14q and 14i) and in the hindbrain. hh RNA injection caused 
repression of pax-6 in the more ventral domain in the diencephalon, while more dorsal 
expression persisted. In addition, pax-6 expression was significantly reduced ventrally 
in rhombomeres 1, 2, and 4 and, in some cases, was completely abolished in these 
rhombomeres. The repressing effect of ectopically expressed hh and pax-6 in normal 

10 embryos are due to repression of pax-6 by nearby hh expressing cells. 

Since absence of pax-6 expression is a feature of the ventral midline, repression of pax-6 
in lateral positions suggests ventralization. Consequently, twhh was injected into 
embryos for analysis of induction of a floor plate marker, F-spondin (Riddle, et ai, 
supra). As described above, ectopic twhh induces F-spondin expression at more dorsal 
1 5 levels in the midbrain and anterior hindbrain. The effects of hh upon expression of both 
pax-6 and F-spondin indicate a ventralization of the brain. Adoption of ventral cell 
identity by- lateral cells might explain their failure to form ventricles (FIGURE 1 5a-f). 

The ventralizing activities of twhh confirm and extend those previously reported for 
shh/vhh-I class genes of chicken, zebrafish, and rat (Echelard, et ai, supra; Krauss, el 

20 ai, supra; Roelink et ai, supra). The early restriction of twhh to midline neural 
progenitors, however, suggests that it may play a specific role in the homeogentic 
mechanisms of floor plate maintenance and expansion (Placzek, et ai, Dew, 1 17:205- 
218, 1993);- In the zebrafish, wild type cells in cyclops hosts can contribute to and induce 
adjacent cells to form floor plate, but only when the transplanted cells populate the neural 

25 plate and .not~the *notochord (Hatta, et ai, Nature, 350:339-341, 1991). We have 
demonstrated that, in cyclops mutants, midline expression of twhh is lost while shh 
expression is maintained in the notochord (FIGURE 18; Krauss, et ai, supra for shh); 
taken together, these results suggest that the homogenetic floor plate signal lost in the 
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cyclops mutant may be encoded by the twhh gene. In the chick and rat, the floor plate 
retains auto-inductive potential long after the loss of floor plate inducing properties by 
the notochord, despite continued expression of shh/vhhl in the notochord (Roelink, et aL, 
supra; Placzek, et aL, supra; Yamada, et aL, Cell, 73*673-686, 1993). Although no 
5 homologues of the twhh class have been reported in other vertebrates, expression of other 
hh homologues in patterns more like those of twhh might help explain these discrepan- 
cies. 

EXAMPLE 16 

TWO DTSTINCT SIGNALING PROTEINS DERIVE FROM 
10 THE frv/i/i-ENCODEP PRECURSOR 

Endogenous hh protein in Drosophila is fund predominantly as an amino- and a carboxy- 
terminal fragment (N and C, respectively) derived by an internal auto-proteolytic 
cleavage of a larger precursor (U for uncleaved), which also occurs in vivo but at lower 
levels (Lee, et aL, supra). Determinants within the amino-terminal domain appear not 

15 to be required for auto-proteolytic activity, whereas mutations affecting the carboxy- 
terminal domain can block auto-proteolysis and reduce activity in vivo (Lee, et aL, 
supra). The auto-proteolysis is blocked by a substitution of alanine for the histidine 
normally present at position 329. This histidine is absolutely invariant in alignments of 
all known hh genes, and its sequence context suggests a catalytic role in auto-proteolysis 

20 (Lee. et aL, supra). 

FIGURE 17 shows zebrafish twiggy-winkle hedgehog derivatives. 17(a) Cartoons of 
various twhh open reading frames. SS (shaded) is the predicted N-terminal signal 
sequence for secretion of these proteins and encompasses the first 27 amino acids of each 
open reading frame. The arrow indicates the predicted internal site of auto-proteolytic 
25 cleavage. Amino acid residue numbers are according to Figure 13b. The filled triangle 
denotes the normal termination codon for the twhh open reading frame. Construct U HA 
contains a mutation that blocks auto-proteolysis (the histidine at residue 273 is changed 
to an alanine; see Lee, J J., et aL, supra). Construct ^56^ contains a stop codon in 
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place of amino acid residue 357 as well as the H273A mutation in U HA . Construct N 
encodes just the first 200 amino acids of twhh. Construct C has had the codons for 
residues 31-197 deleted. 17(b) shows in vitro translation of the expression constructs 
shown schematically in part a. Constructs were translated in vitro in the presence of 35 S 

5 methionine and analyzed by autoradiography after SDS-PAGE. The protein products are 
shown schematically to the left. Lanes 1 and 6: Auto-proteolysis of the full-length (U ss ) 
protein creates two fragments, an N-terminal fragment (N ss ) and a C-terminal fragment 
(C). Lane 2: Construct U HA only makes an uncleaved form of twhh protein that 
comigrates with U^ twhh via auto-cleavage. Lane 5: Construct C encodes processed 

10 and unprocessed forms which are visible as two bands migrating closely together. The 
bottom band is the C protein made from auto-proteolysis of the U ss (a31-197). All 
constructs were made by in vitro mutagenesis of expression construct TUStwhh (see 
FIGURE 1 5) using the method of RPCR. The sequence of all 

constructs were confirmed by dideoxy sequencing. In vitro translations were performed 
1 5 according to manufacturer's instructions (Promega). 

The vertebrate hh proteins encoded by shh, twhh and mouse-shh/Hhg-l also undergo 
auto-proteolysis to yield two smaller species from a single larger precursor (Lee, et ai, 
supra; Chang, et ai, supra; see lanes 1 and 6 in FIGURE 17b). The invariant histidine 
to alanine mutation to generate a construct encoding a form of the twhh protein that is not 

20 auto-proteolytically cleaved (Uha)- We have also introduced a nonsense codon and 
deleted a segment of coding sequence to generate constructs that produce either the 
amino- or the carboxy-terminal domains of twhh (N and C, respectively; see lanes 4 and 
5 in FIGURE 1 7b); constructs are schematically diagrammed in FIGURE 1 7a). To target 
these proteins to the secretory pathway, all constructs retained the normal twhh signal 

25 sequence. 

Synthetic mRNAs transcribed from these constructs were injected to examine the role 
of processing and to assay the activities of individual protein fragments; the results are 
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summarized in Table I and are based on the activities presented in FIGURE 15. The 
most striking conclusion from these experiments is that N and C both exhibit activity, 
and that these activities are distinguishable. Thus, although both N and C are capable of 
ectobpically activating pax~2 in the developing eye, thereby providing an internal 

5 injection control, only N was capable of efficiently repressing pax-6 (FIGURE 1 6). Later 
effects on lens development were also more extreme for N, consistent with the role of 
pax-6 in lens development suggested by its mutant phenotypes in mice. (See Ton, C.C., 
etai, Cell 67:1059-1074, 1991; Glaser,T., etai, Nat. Genetics 2:232-239 , 1992; Hill, 
R.E., et al, Nature 354:522-525, 1991; Hogan, B.L., et aL J- Embryol Exp. Morpk, 

10 97:95-110, 1986; and Hogan, B.L., etai, Development. 103Suppl.:l 15-1 19, 1988.) 

In considering the activity of delta N-C, it is important to recognize the activity of 
endogenous hh genes in these experiments, which are inhibited by delta N-C and 
fragments thereof, (see Example 18 and FIGURE 18 for further discussion) 

The uncleaved U HA protein is only somewhat less active than C in inducing pax-2, but 
15 it also was not able to repress pax6 efficiently (FIGURE 16). The latter is particularly 
notable since the Uha protein (U35& ; see FIGURE 17a, b) has activities not 
significantly different from N (FIGURE 16). Thus, in addition to carrying determinants 
important for auto-proteolysis and par-2induction, the C-terminus also contains a domain 
inhibitory to N-terminal function when in the context of the uncleaved hh protein. The 
20 C-terminus can also inhibit N action by an intermolecular mechanism (Lai, et al, supra). 
The existence of such an inhibitory domain in C suggests that if autoproteoiyis can be 
modulated, such modulation might regulate the activity of hh in vivo. This possibility 
highlights the importance of ascertaining the processed state of hh proteins expressed in 
any particular patterning center to understand the potential hh activities generated. 
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EXAMPLE 17 

DUAL ROLES OF hit SIGNALING PROTEINS IN EARLY EYE 
AND BRAIN PATTERNING 

In understanding the normal roles of N and C in eye and brain patterning, the N and C 
5 derivatives of the Drosophila hh gene may offer some insight. The Drosophila N 
derivative is retained close to its embryonic site of synthesis in a segmentally striped 
pattern (Tabata and Kornberg, Cell, 76:89-102, 1994; Taylor, et al % Meek Dev., 42 89- 
96, 1993), is cell-associated when expressed in cultured cells, and is effectively bound 
by heparin agarose in vitro, suggesting the possibility of extracellular matrix association. 
10 The C-terminal fragment, in contrast, is not bound effectively by heparin agarose, is 
almost quantitatively released into the culture supernatant of expressing cultured cells, 
and is only diffusely localized in embryos. Although the activities of individual 
fragments have not been assayed, the biochemical differences and tissue distributions of 
Drosophila N and C may account for the short and long range nature of the functions 
1 5 associated with hh during Drosophila development. 

Although the tissue distributions of zebrafish N and C are not known, their activities in 
ectopic expression assays are also suggestive of short- and long-range functions when 
considered in the context of normal expression patterns of hh, pax-2 and pax-6. The 
normal gradient of pax-2 expression in the optic vesicle extends a substantial distance 

20 from its maximum adjacent to the site of hh expression in the protuberance; the ability 
of ectopic C to activate pax-2 therefore suggests that, consistent with the distribution of 
C in Drosophila, zebrafish C may carry out a long-range function. Repression of 
endogenous pax-6 expression, in contrast, appears to be a short-range function since pax- 
6 expression occurs close to endogenous hh expression. Efficient repression of pax-6 is 

25 an attribute of constructs producing N, and a short-range function for N would be 
consistent with the distribution of N in Drosophila. 
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Two types of M-dependent activity have been reported for /i/r-transfected cultured cells. 
One is the apparent contact-dependent induction of floor plate markers (Roelink, H., et 
aL, Cell 76:761-775, 1994); the second induction of sclerotome markers in presomitic 
mesoderm, is diffusible and acts at long-range. 

5 EXAMPLE 18 

CHARACTERIZATION OF XENOPUS hh 

1 . Materials and Methods 

cDNAs encoding full-length Xenopus hedgehogs, or encoding amino terminal or carboxy 
10 terminal domains linked to secretory leader sequences were transcribed in vitro to yield 
translatable messenger RNA. The synthetic messenger RN As, and control mRN As, were 
microinjected into the animal poles of cleavage stage Xenopus embryos, which were 
allowed to develop to the blastula stage, at which time the animal cap explants were 
prepared from the upper one fourth of the embryo. These blastula cap explants were then 
15 cultured in vitro in physiological saline in the presence or absence of the transforming 
growth factor beta family member, recombinant human activin A, All explants were 
allowed to develop until control embryos had grown to neurula stage, or to tadpole stage. 
Importantly, blastula caps left untreated differentiate from ectoderm into atypical 
epidermis. Blastula caps treated with activin differentiate into mesodermal and neural 
20 cell types. Thus, the question was whether hedgehog, or its proteolytic derivatives, 
would change the differentiation of cells away from becoming epidermis, and into 
another cell type. A second question was whether hedgehog can work with activin to 
alter the normal response of the tissue to either factor by itself. 

Explants were then extracted to yield rnRNA by methods commonly used by those of 
25 skill in the art, which was used as template with reverse transcriptase to yield cDNA. 
The cDNA was then used as template with various sets of primers for PCR for specific 
genes, reverse-transcriptase-polymerase chain reaction, or RT-PCR. This results in 
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specific amplification of radioactive products which are diagnostic for the presence and 
level of the messenger RNAs which were present in the explants. Samples were 
separated on polyacrylamide gels, which were exposed to X-ray film to yield the bands 
shown in the figures. Thus, the darker bands correspond to a greater level of the specific 
5 mRNA. 

FIGURE 1 8A and B demonstrate that hedgehog induces pituitary and anterior brain 
genes, and can cooperate with activin or with neural inducers such as noggin and 
follistatin which are induced by activin to elevate expression of these genes in explanted 
embryonic tissue. All odd numbered lanes lack reverse transcriptase in the RT-PCR 

1 0 reaction and are negative controls. All even numbered lanes have this enzyme, and thus 
give specific bands to mRNA. In Panel A. Lanes 1-2 are control blastula caps, lanes 3-4 
are Xenopus hedgehog-expressing blastula caps, lanes 5-6 are control blastula caps 
treated with activin, lanes 7-8 are hedgehog-expressing blastula caps treated with activin, 
and 9-1 0 are pro lact in-expressing blastula caps treated with activin to serve as a control 

15 for simply expressing a secreted protein in the blastula cap. The primers used for the 
assay are shown to the left of each panel, i.e., XAG 1 is a cement gland marker, 
XANF1B is a pituitary marker, otx-A is an anterior brain marker, en-2 is a midbrain- 
hindbrain boundary marker, krox 20 is a rhombomere-specific hindbrain marker, HIHbox 
6 is a posterior hindbrain marker, NCAM is a general neural marker, activin is a control 

20 for mesoderm induction, and elongation factor is a positive control to shown that all even 
numbered lanes did in fact have cDNA present. 

The panel labelled XANF1B detects a pituitary gene. Lane 4 (panel A) shows that 
hedgehog induces this pituitary marker, and thus likely pituitary cell types, in blastula 
cap explants (see also FIGURE 20, lane 6, for a stronger signal showing this), when 
25 compared to control explants in the absence of hedgehog (lane 2), which do not express 
this gene. Lane 6 shows that explants treated with activin, in the absence of hedgehog, 
also express the pituitary gene. Lane 8 shows that explants treated with both hedgehog, 
and with activin, give highest levels of the pituitary gene. Lane 10 proves that this effect 
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of hedgehog is specific, since prolactin, another secreted protein, does not lead to this 
elevated level of pituitary gene. 

The panel labelled OTX-A detects this anterior brain gene. Lane 4 (and 6 in Figure 20) 
shows that hedgehog can induce this neural-specific gene. Lane 8 shows that the level 
5 of this neural gene is highest in tissue treated with both activin and hedgehog, relative 
to hedgehog alone (lane 4), or activin along (lane 6), and control explants do not express 
this gene (lane 2). Again, this effect is specific to hedgehog, since prolactin (lane 10) did 
not lead to elevated expression of this gene. The panel labelled XAG-1 detects a cement 
gland-specific gene, and lane 4 shows that hedgehog induces this gene at high level 

10 In panel 18B, embryos were injected with N or AN-C, and some animal cap explants 
were treated with activin before culturing until sibling embryos reached tailbud stage. 
Lanes 1,2: control animal caps from uninjected embryos. Lanes 3, 4: control animal caps 
from uninjected embryos, treated with activin. Lanes 5,6: animal caps from embryos 
injected with N and treated with activin. Lanes 7, 8: animal caps from embryos injected 
15 with AN-C and treated with activin. Whereas N displays activities in activin-treated 
explants similar to those oiX-bhh (see B) AN-C produces the opposite effect, decreasing 
anterior and increasing posterior neural marker expression. As shown in Figure 1 8B, N 
behaves like X-bhh in that it induces elevated levels of XANF-2 and Otx-A (lane 6) 
relative to control activin-treated animal caps (lane 4). Moreover, N'also leads to a 
20 decrease in the expression of more posterior markers, such as krox-20 and XIHbox-6, as 
observed following injection of X-bhh. In contrast to the activity of N (Fig, 4C, lane 6), 
AN-C decreases the expression of the anterior neural genes XANF-2 or Otx-A (Fig. 4C, 
lane 8) in activin-treated animal caps when compared to uninjected controls (lane 4). 
Moreover, AN-C also leads to an increase in the expression of more posterior markers, 
25 such as En-2 and Xlhbox-6. 

FIGURE 19 shows X-bhh modifies the anteroposterior pattern of neural gene expression 
in explants under the influence of endogenous neural inducers. (A) Isolation of dorsal 
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explants from injected embryos for the preparation of Keller sandwiches (Keller and 
Danilchik, 1988; Doniach, et a/., 1992; redrawn from Doniach, 1993). (B) Keller 
sandwiches were made from uninjected (lanes 1 and 2) and X-M/i-injected (lanes 3 and 
4) embroys, total RNA was isolated when control embryos reached stage 20, and RT- 

5 PCR was used to analyze the expression of XAG-1 and neural markers. XAG-1 is a 
cement gland marker, XANF-2 is an anterior pituitary marker, Otx-A is a forebrain 
marker, En-2 demarcates the midbrain-hindbrain boundary, Krox-20 marks rhombomeres 
3 and 5 of the hindbrain and XlHbox-6 is a spinal cord marker. N-CAM is a general 
neural marker whose expression is not restricted along the anteroposterior axis. The EF- 

10 la control demonstrates that a comparable amount of RNA was assayed in each set. Note 
that expression of XAG-1 and anterior neural markers is stimulated by X-bhh treatment, 
whereas expression of posterior neural markers is suppressed. 

FIGURE 20 demonstration of differential activities of N and C domains of hedgehog 
proteins, As in FIGURE 18 above, odd numbered lanes are negative control lanes, and 

15 positive numbered lanes show specific gene expression for the markers described above. 
The N domain of hedgehog is encoded in the construct called Xhhl208 (lane 8), and the 
C domain is encoded in the construct called Xhhl delta 27-208 (lane 10). The construct 
Xhhll-1270A (lane 12) is specifically mutated so that it is unable to undergo self- 
processing. The ability of the N and C domains to induce the genes described above is 

20 compared to control blastuia cap explants (lane 4), entire embryos as a positive control 
(lane 2), blastuia cap explants expressing a mutated hedgehog as a negative control (lane 
14), blastuia caps expressing the entire hedgehog 1 (lane 6), and blastuia cap explants 
treated with an independent neural inducer, noggin (lane 16) (discovered by Richard 
Harland at University of California at Berkeley). 

25 Examining the first panel for the cement gland marker XAG-1 clearly shows that intact 
hedgehog (lane 6) and the N domain (lane 8) and the processing defective hedgehog 
(lane 12) are much better than inducing the cement gland than is the C domain (lane 1). 
Examining the second panel demonstrates that the C domain (lane 10) is better at 
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inducing the pituitary gene XANF1B than is the N domain (lane 8). Since the N domain 
induces the XAG-1 marker better, described in point A above, the two results together 
clearly demonstrate that the N and C domains have distinguishable activities. 
Examination of the remaining panels shows that all described activities of the normal 
5 hedgehog (lane 6 ) can be defined in terms of the activities of the N and C domain. 

Examining the third panel, for the forebrain gene otx-A, shows that both the N domain 
(lane 8) and C domain (lane 10) induce similar levels of this gene, but the processing 
defective hedgehog (lane 12) is better than either at inducing this gene. 

Examining the fourth panel of this figure (NCAM), (as well as the FIGURE 1 8 panels 
10 EN-2, krox20, XIHbox6, and NCAM), shows that hedgehog does not induces these more 
posterior neural genes. Notably, noggin (lane 16) is able to induce pituitary gene and 
forebrain gene, but it also induces the general neural gene, NCAM, which hedgehog does 
not. This clearly shows that hedgehog is a distinct activity from the neural inducer 
noggin, and has a more restricted ability to induce neural genes. 

15 Experiments in the Xenopus embryo were conducted by injecting full-length hedgehog 
RNA, and immunoprecipitating with a C-domain specific antibody, which proves that 
full length hedgehog does in fact get processed in vivo in vertebrates, consistent with the 
data shown in earlier Examples in Drosophila. Thus, the ideas for the utility of detecting 
hedgehog N and C domains is based on knowledge that such domains do appear through 

20 hedgehog processing in vertebrates. Moreover, the knowledge that hedgehog processing 
does occur in vivo naturally raised the question of whether the resulting N and C domains 
have independent activity. 

The results in FIGURE 18 are novel insofar as they establish that the activity of 
hedgehog in inducing a pituitary gene, and an anterior brain gene, may be enhanced by 
25 the TGFp family of growth factors. This enhancement likely applies to the N and C 
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domains described in FIGURE 20, since the genes analyzed are the same. This 
enhancement is due ... 

to hh synergizing with neural inducing factors which are themselves induced by TGF-p 
family members, including but not limited to such molecules as noggin and follistatin. 

5 The data in FIGURE 20 makes several important points. First, the data show that the N 
and C domains have different though somewhat overlapping activities, and that the N and 
C activities added together account for all of the observed activity of the intact hedgehog 
protein. Thus, any clinical or diagnostic uses of hedgehog might be improved by use of 
the N or C domain, as one generally wishes to use the smallest protein which has an 

1 0 activity for clinical work, as it is less likely to evoke adverse immune responses, or other 
adverse side effects. Second, the data show that the C domain is better than the N 
domain in inducing pituitary gene expression and, since it has less induction of cement 
gland genes that intact hedgehog, or N domain, it suggests that the C domain might be 
useful in clinical situations where one wishes to enhance the development or expression 

1 5 of the pituitary as specifically as possible. As the pituitary is the source of a number of 
hormones, any treatment for enhancing pituitary ceil growth and activity would ideally 
have as few side effects as possible, and the C domain is thus a viable candidate for 
therapies with enhanced pituitary cell growth and function in mind. Third, relating to 
studies regarding noggin, .FIGURE 20 shows clearly that while both hedgehog and 

20 noggin can induce pituitary gene expression, hedgehog is more specific, since hedgehog 
does not induce the general neural marker NCAM, whereas noggin induces NC AM as 
well as pituitary. Fourth, the hedgehog which was mutated to prevent processing (lane 
12) is as active as full-length and wild-type hedgehog (lane 6) in inducing pituitary gene 
expression, >but. the processing defective hedgehog is better, at inducing the forebrain 

25 marker otx-A. Thus/for some clinical applications of hedgehog in inducing specific cell 
types, it is possible that the -processing-defective-hedgehog will be superior compared to 
normal hedgehog. 
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FIGURE 21 shows AN-C interferes with X-bhh and N activity in animal cap explants. 
Embryos were injected with various RNAs, animal cap explants were cultured until 
sibling embryos reached tailbud (stage 25), at which time RT-PCR was used to analyze 
the expression of the cement gland marker XAG-1 and the control RNA, EF-la. Lanes 

5 1,2: control animal caps from uninjected embryos. Lanes 3, 4: animal caps from 
embryos injected with both X-bhh and prolactin RNAs. Lanes 5, 6: animal caps from 
embryos injected with box X-bhh and AN-C. Lanes 7, 8: animal caps from embryos 
injected with both N and proiacting RNAs. Lanes 9, 10: animal caps from embryos 
injected with both N and AN-C. The N and X-bhh experiments were conducted 

10 independently and thus absolute levels in lanes 3-6 should not be compared to those in 
lanes 7- 10. Note that the induction of XAG-1 expression by X-bhh or N is reduced by 
co-injection of AN-C. 

An internal deletion of X-bhh (aN-C) blocked the activity of X-bhh and N in explants and 
reduced dorsoanterior structures in embryos. As elevated hh activity increases the 
15 expression of anterior neural genes, and as aN-C reduces dorsoanterior structures, these 
complementary data support a role for hh in neural induction and anteroposterior 
patterning. 

aN-C deletes amino acids 28-194 of X-bhh. The primary translation product is predicted 
to undergo signal sequence cleavage removing amino acids 1-23, and to undergo 

20 autoproteolysis. Based on the cleavage site in Drosophila hh (Porter, et aL Nature, 
374:363, 1995) autoproteolysis would generate a C domain of X-bhh amino acids 198- 
409, as well as a predicted seven amino acid polypeptide, representing amino acids 24- 
27, and 195-197 (Lai, etai t Development 111:2349, 1995). Analysis of the effect of aN- 
C on neural markers was by standard methods including Northern blot analysis and in 

25 situ hybridization (LaL et ai t supra, incorporated herein by reference). 

Although aN-C does not induce the cement gland marker XAG-1, it decreases the 
expression of anterior ectodermal and neural markers in activin-treated animal caps. 
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Thus, aN-C has the capacity to affect neural patterning. aN-C also promotes an increase 
in posterior neural markers in activin-treated animal caps. Mixing aN-C with N or full 
length X-bhh at a 1:1 ratio led to a dramatic inhibition of the induction of cement gland 
in animal cap assays, supporting the hypothesis that aN-C interfered with X-hh. 

5 EXAMPLE 19 

CHOLESTEROL MODIFICATION OF HEDGEHOG POLYPEPTIDE 

In addition to peptide bond cleavage, Hh autoprocessing causes the covalent attachment 
of a lipophilic adduct to the COOH-temiinus of Hh-N p (J.A.Porter et al., Cell 86, 21, 
1996). This modification is critical for the spatially restricted tissue localization of the 

10 Hh signal; in its absence, the signaling domain exerts an inappropriate influence beyond 
its site of expression (J.A.Porter et al., Cell 86, 21, 1996). Physical and biochemical 
characterization of this lipophilic adduct indicates that it is not the glycosyl phosphatidyl 
inositol (GPI) anchor, the only other known lipophilic modification associated with 
secreted cell surface proteins in eukaryotes (S. Udenfriend .and K. Kodukula, A 

15 nnu.Rev.Biochem. 64, 563,4995; and P J.Casey, Science 268, 221, 1995). 

In vitro studies of Hh autoprocessing were performed using a bactcrially expressed 
derivative of the Drosophiia protein, HiSfiHh-C, in which the majority of the - 
terminal signaling domain and the signal sequence are replaced by a hexa-histidine tag. 
Cleavage of this protein occurs between residues corresponding to Gly 257 and Cys 258 

20 (J.A. Porter et al., Nature 374, 363, 1995) and likely proceeds through a labile thioester 
intermediate formed by the cysteine thiol and the glycine carbonyl carbon. In the 
presence -of -high concentrations of thiols or other small molecules with strongly 
nucleophilic properties at neutral pH, cleavage of the peptide results from nucleophilic 
attack upon the thioester carbonyl, causing displacement of the thiol group and formation 

25 of an adduct to Gly 257 by the attacking nucleophilic (Fig 22A). Thus, in reactions with 
50 mM dithiothreitol, in vitro cleavage of His 6 Hh-C proceeded to greater than 50% 
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completion within three hours at 30°C (Fig. 22B). At ImM dithiothreitol, however, the 
reaction yielded no visible cleavage product (Fig 22B). 



Figure 22 shows lipid stimulation of Hh autoprocessing in vitro. Panel A illustrates the 
mechanism of Hh processing. The reaction is initiated by formation of a thioester 

5 between the thiol side chain of cysteine 258 and the carbonyl carbon of glycine 257, and 
N to S shift. This activated intermediate then undergoes a nucleophilic attack by DTT in 
vitro or by a piophilic nucleophilic in vivo resulting in cleavage as well as a formation 
of a covaient adduct at the carboxy-terminus of the amino-terminal product, X denotes 
the attacking nucleophilic. Panel B shows a coomassie blue stained SDS-polyacrylamide 

10 gel showing in vitro autocleavage reactions of the bacterially expressed His6Hh-C 
protein (~29kD) incubated for 3 hours at 30 °C with no additions (lane 1), 50 mMDTT 
(lane 2), 1 mMDTT (lane 3), or 1 mMDTT plus bulk S2 cell lipids (lane 4). The Hh-C 
product of the autoprocessing reaction migrates as an ~25kD species (lanes 2 and 4); the 
~5kD NH2-terminal product is not resolved in this gel. 



15 The in vivo reaction resulted in lipophilic modification of the NH r terminal signaling 
domain. The most direct mechanism by which this could occur, by analogy to the in vitro 
mechanism (Fig 22A), would be for a lipid to function as the displacing nucleophilic in 
attack of the thioester. To explore this possibility, bulk lipids extracted from Drosophila 
S2 cultured cells (I.Schneider, J Embryol Exp Morph 27, 353 (1972); and F.M. Ausubel 

20 et ai., Current protocols in molecular biology (Greene Publishing Associates and Wiley- 
Interscience, New York, 1995) were added to the in vitro processing reaction in the 
presence of 1 mM dithiothreitol. Cleavage was observed and the reaction proceeded to 
20% completion in a three hour period (Fig 22B). The reaction continues beyond this 
time and reaches -50% completion by 18 hours. 

25 To identify the components active in the reaction, the bulk S2 lipids were separated into 
two classes, neutral and complex, by silicic acid column chromatography (W.W. Christie, 
Lipid analysis (Pergamon, Oxford, ed.2nd, 1982). Figure 23 A is a thin layer chromatog- 
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raphy (TLC) plate coated with silica gel G (Merck) showing the fractionation of bulk S2 
cell lipids using a heptane:ether:formic acid solvent (80:20:2). Six major spots are 
visualized by acid charring and are indicated by letters A-F. Figure 23B is a Coomassie 
blue-stained SDS-polyacrylamide gel showing in vitro autocleavage reactions of the 
5 bacterial expressed His 6 Hh-C protein incubated with 1 mMDTT plus either un- 
fractionated S2 cell lipids (lane 1), or spots A through F (lanes 2-7, respectively). 
Addition of lipid spot B but no other resulted in processing of His 6 Hh-C protein. Figure 
" 23C is TLC of S2 cell lipids (lane 1) along with selected lipid standards: phospha- 
tidylcholine (lane 2), a diacylglycerol (lane 3), cholesterol (lane 4), stearic acid (lane 5), 
10 a triacylglycerol (lane 6), and cholesteryl ester (lane 7). Lipid spot B comigrates with 
cholesterol, as also demonstrated by mixing radio-labeled cholesterol with S2 lipids 
before TLC fractionation. Figure 23 D is a Coomassie blue stained SDS-polyacrylamide 
gel showing that relative to 1 mMDTT alone (lane 1) cholesterol (0.35 mM) + 1 mMDTT 
(lane 2) stimulates His 2 Hh-C autocleavage in vitro. Figure 23 E is an autoradiograrn of 
1 5 electrophoretically-resolved products of His 6 Hh-C autocleavage reactions driven by 20 
mMDTT (lane 1) or 1 mMDTT-K).35 mM cholesterol (lane 2). For lane 1 [ 3 H]cholesterol 
(3/zCi) was added at the end of the incubation period just prior to electrophoresis; for 
lane 2[ 3 H]cholesterol was present throughout the incubation period and is incorporated 
into the amino-terminal product of the reaction. To resolve the -5 kD product of His6Hh- 
20 C autocleavage, reaction products were separated in 1 7% SDS-polyacrylamide gels. 

The activity was found exclusively in the neutral class, so the lipids were subjected to 
preparative thin layer chromatography (TLC) using a solvent system that resolved neutral 
lipids (W.W. Christie, Lipid analysis (Pergamon, Oxford, ed.2nd, 1 982) (Fig 23 A). Lipid 
spots were visualized with iodine vapor or acid charring, and adsorbent at the 
25 corresponding positions of identical uncharred plates was excised and extracted with 
chioroform/methanol/water. Only lipids extracted from spot B displayed stimulatory 
activity in the in vitro cleavage reaction (Fig 23 B). 
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With the use of various lipid standards, it was found that spot B comigrated with 
cholesterol (Fig 23C). In addition, the active S2 cell-derived lipid displayed the same 
mobility as cholesterol in two other solvent systems and gave a positive color test when 
sprayed with a specific reagent that reacts with sterols; W.W. Christie, Lipid analysis 

5 (Pergamon, Oxford, ed.2nd, 1982); and R.RXowry, Journal of Lipid Research 9, 397, 
1968), Taken together these results imply that the active lipid component is in the sterol 
fraction of the S2 lipids. Indeed, it was found that cholesterol, which is the principal 
sterol in eukaryotic cell membranes (W.W. Christie, Lipid analysis (Pergamon, Oxford, 
ed.2nd } 1982)), displayed stimulatory activity similar to that observed with lipids 

10 extracted from spot B when added in pure form to the in vitro processing reaction (Fig 
23D). To establish that the stimulatory activity of cholesterol is a result of its participa- 
tion as a modifying group, it was shown that 3 H-labeled cholesterol added to the 1 mM 
dithiothreitol reaction was incorporated into the NH ; -terminal product (Fig 23E). No 
incorporation was seen, however, when [ 3 H]cholesterol was added just prior to 

1 5 electrophoresis to a reaction incubated for 3 hours with 20 mM dithiothreitol (Fig 23E). 
Also consistent with covalent cholesterol addition, the NH 2 -terminal fragment of His 6 Hh- 
C generated by the cholesterol-driven reaction migrated just beneath the 6 kD marker, 
whereas the product of the reaction driven by 20 mM dithiothreitol migrated just above 
this marker (Fig 24A). Such a shift in mobility, thought to result from an increase 

20 capacity for SDS binding to the covalently linked lipid (M.L.Cardoso de Almeida and 
M.J. Turner, Nature 302, 349, 1983),was also noted for Hh-N p as compared to the 
precisely truncated NH 2 -terminal fragment (Hh-N, truncated following Gly 257). 

The part of the sterol most likely to act as attacking is the 3P hydroxyl. Such an attack 
would leave cholesterol as a covalent adduct in ester linkage to the carboxylate of the 
25 terminal residue of the NH2-terminal fragment (GLY 257). 

Figure 24A shows Coomassie stained gels of His 6 Hh-C autocleavage reactions carried 
out in the presence of 20 mMDTT (lane 1), or 1 mMDTT+0.35 mM cholesterol (lane 2). 
Lane 3 contains a mixture of the samples loaded in lanes 1 and 2. The amino-terminal 
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product of the cholesterol driven reaction migrates approximately 2 kD faster than the 
DTT-driven reaction fragment. Figure 24B is Coomassie stained gels showing protein 
products of His 6 Hh-C autocleavage reactions carried out in the presence of 1 mM 
DTT+0.35 mM cholesterol (lanes 1 and 2) or with 20 mM DTT (lane 3). Prior to loading 
5 the gel, samples in lane 2 and 3 were incubated for 60 minutes with 50 mMKOH in 90% 
methanol (M.C. Field and A.K. Menon, in Lipid modification of proteins N.M. Hooper, 
A.J. Turner, Eds. (Oxford University Press, New York 1992) pp.155). Base treatment 
causes the cholesterol-driven amino-terminal reaction product to comigrate with the 
corresponding DTT-driven reaction product. Figure 24C is an autoradiogram of 

10 immunoblotted Hh amino-terminal domains purified from cultured S2 cells. Amino- 
terminal domains were derived either from a construct truncated after glycine 257 (Hh-N 
lanes 1,3,4,8, and 9) or from a construct encoding wild-type Hh that produces the amino- 
terminal domain via the processing reaction (Hh-N p , lane 2,3,5,6,7,8, and 9). Proteins 
were either directly loaded (lanes 1 and 2) or base-treated (M.C. Field and A.K. Menon, 

15 in Lipid modification of proteins N.M. Hooper, A.J. Turner, Eds. (Oxford University 
Press, New York 1992) pp.155) for 5 minutes (lane 5), 20 minutes (lane 6) or 1 hour 
(lanes 7 and 4) prior to electrophoresis. Lane 3 contains a mixture of the samples loaded 
in lanes 1 and 2, lane 8 contains a mixture of the samples loaded in lanes 7 and 4, and 
lane 9 contains a mixture of the samples loaded in lanes 7 and 2. Upon base treatment, 

20 Hh-N p undergoes a shift in mobility from 18.5 kD to 19.5 kD, the mobility of the 
unmodified Hh-N protein. 

Ester bonds are subject to hydrolysis in alkaline conditions and base treatment prior to 
electrophoresis indeed reduced the migration of the cholesterol-driven reaction product 
to a position coinciding with that of the dithiothreitol-driven reaction product. These 
25 results are consistent with stimulation of the in vitro processing reaction by direct 
nucleophilic attack of cholesterol on the thioester intermediate to form an ester-linked 
adduct. If processing of Hh also results in formation of an ester-linked cholesterol adduct 
in vivo, then the protetn-lipid linkage should be subject to base hydrolysis with a 
concomitant shift in electrophoretic mobility of the protein (normally 18.5 kD). The 
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immunoblot in Fig 24C shows the base-induced appearance of a species of reduced 
mobility (19.5 kD), which increased in abundance from -1/3 of the total after five 
minutes of treatment to most of the immunoreactive protein after one hour. This novel 
species comigrated with truncated, unprocessed Hh-N, which is not affected by base 
5 treatment. These data are consistent with an ester bond as the protein-lipid linkage in Hh- 

N P - 

To confirm the involvement of cholesterol in formation of the Hh-N p adduct in vivo, S2 
cells containing an inducible wild-type Hh construct were metabolically labeled with 
[ 3 H]cholesterol. Figure 25 A is an audioradiogram of a gel loaded with total cell proteins 

10 from S2 cells containing a stably integrated Cu-H--inducible hedgehog gene. Prior to 
harvesting, these cells were grown in media supplemented with [ 3 H]cholesterol in the 
absence (lane 1) or presence (lane 2) of 1 mM CuS04. [ 3 H]cholesterol incorporation is 
dependent upon Cu++ induction (lane 2) and is restricted to a single protein species 
migrating at a position corresponding to Hh-N p . Figure 25B is an HPLC profile of sterols 

15 separated on a CI 8 column by isocratic elution with a solvent containing methan- 
ol:ethanol:water (86:10:4) (RJ. Rodriguez and L.W. Parks, Methods of Enzymology 1 11 , 
37, 1985). -5 jug of each sterol was mixed, loaded, and elution monitored by absorbance 
at 210 nM. The structure of cholesterol is shown above cholesterol peak. Other sterols 
include: 1)10 desmosterol, which contains one additional double bond between carbon 

20 24 and 25; 2) 20 7-dehydrocholesterol, which contains one additional double bond 
between carbon 7 and 8; 3) campesterol which contains an additional methyl group on 
carbon 24; and 4) sitosterol, which contains an additional ethyl group on carbon 24. 
Figure 25C shows HPLC analysis as in (B) of the adduct released by base treatment of 
Hh-N p metabolically labeled with [ 3 H]cholesterol (A). The radioactive species recovered 

25 from the metabolically labeled protein collates with cholesterol. Figure 25D shows 
metabolic labeling of vertebrate Sonic hedgehog protein with [ 3 H]cholesterol. 
Autoradiogram of a gel loaded with total cell proteins from COS-7 cells transferred with 
a wild-type Sonic hedgehog expression construct (Shh, lane 1) or a construct that 
generates an unprocessed amino-terminal protein truncated after the conserved glycine 
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at the site of autocleavage (Shh-N, lane 2). The COS-7 cells were incubated in culture 
medium supplemented with [ 3 H]chqiesterol for 24 hours prior to and 36 hours after 
transfection (COS-7 cells grown at 37°C in DMEM supplemented with 10% fetal calf 
serum were plated at -35% confluence onto two 35 mm dishes in 1 ml of Optimem 

5 media (Gibco) containing 1.5% fetal bovine sera and 25,uCi of [ 3 H]cholesterol, giving 
~40Mg/ml as the final concentration of cholesterol with a specific activity of 2Ci/mmol 
(labeling medium). After 24 hours the labeling medium was removed and the cells were 
transfected for 6 hours with Shh or Shh-N expression constructs using lipofectamine 
(Gibco) and serum-free DMEM media. After transfection, 1 ml of fresh labeling medium 

10 was added to each dish and the cells were incubated for 36 hours at 37°C. The cells were 
then harvested without washing, lysed on the plate with Tris buffered saline plus 1% 
Triton X-100 and the total cell proteins were precipitated with acetone, washed and 
analyzed as described above for the S2 cell proteins). A strongly labeled species with 
the Shh but not Shh-N construct. Several other less heavily labeled species are apparent 

1 5 in both lanes, and may represent other cholesterol-modified proteins. 

After 48 hours of growth in the presence of [ 3 H]cholesterol, induced and uninduced 
cultured cells were detergent extracted and total cell proteins were subjected to SDS- 
PAGE followed by fluorography (Metabolic labeling of S2 cultured cells with [" 
3 H] cholesterol was performed essentially as described (Silberkang ? et ai, J Biol. Chem. 

20 258-8503, 1983). Briefly, cells containing a stably integrated Cu + ~-inducible hedgehog 
gene were grown at 23 °C for two weeks in Schneider cell media (Gibco) containing a 
5% fetal bovine serum depleted of lipoprotein (low cholesterol media, -20 /^g/ml 
cholesterol). These cells were then plated at 40% confluence onto two 35 mm tissue 
culture dishes (Nunc) in I ml of low cholesterol media supplemented with 300 /iCi of 

25 labeled cholesterol, [L2.6.7- 3 H (N)] 65 Ci/mM (NEN) giving a specific activity for 
cholesterol in this medium of -5 Ci/mmol. After 24 hours (1 doubling time) one plate 
of cells was induced to express Hh protein by the addition of CuS0 4 (1 mM final 
concentration). After an additional 24 hours the cells from both dishes were harvested, 
lysed in Tris buffered saline containing 1% Triton X-100, and total cell protein was 
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precipitated with 5 volumes of cold acetone. The protein pellet was resuspended in 2% 
SDS in H 2 0 and reprecipitated with acetone several times to remove unincorporated 
radioactivity prior to loading onto SDS polyacrylamide gels for analysis. Initial labeling 
experiments in which 25 /zCi of cholesterol was added resulted in - 10 fold decrease in 
5 extent of label incorporated into the inducible Hh-N p protein). Whereas uninduced cells 
showed no incorporation of [ 3 H]cholesterol into cellular proteins, ceils induced to 
express Hh showed a single strong band with a mobility corresponding to that of Hh-N p 
(Fig 25 A). Given the hydrophobic character of Hh-N p , these results suggest that either 
cholesterol itself or a sterol derivative constitutes the lipophilic adduct of Hh-N p . To 
10 determine whether cholesterol is the final form of the adduct, radio-labeled Hh-N ? 
protein excised from a gel was base-treated to release the adduct, which was then isolated 
by either extraction (HPLC analysis of the Hh-N p adduct involved gel isolation of the 
radioactive band, KOH/methanol treatment of the band to break the ester linkage as 
described, followed by neutralization of the solution with acetic acid, drying in a 
1 5 speedvac, resuspension in H 2 0 and extraction of the hydrophobic radioactivity with ether. 
After evaporation of the ether the sample was resuspended in isopropanol and applied 
to the CI 8 column for analysis. Radio-labeled adduct was then subjected to analysis by 
HPLC with a method specifically designed to resolve various sterols (R.J. Rodriguez and 
L.W. Parks, Methods of Enzvmologv 1 1 U 37, 1985) (Fig 25B). The radioactive adduct 
20 released from Hh-N p eluted at the same position as the cholesterol standard, and no 
radioactivity was detected in any other fraction (Fig 25C). 

The amount of radioactive cholesterol incorporated is consistent with that expected if all 
of the Hh-N p synthesized upon induction received a cholesterol adduct (The specific 
activity of [ 3 H]cholesterol in the S2 cell labeling medium was -5 Ci/mmol. Assuming 
25 after a 24 hour doubling time that this concentration approximately represents that within 
the S2 cell membrane, then any protein subsequently expressed and receiving cholesterol 
as an adduct would also be labeled at the same specific activity. As determined by 
standardized coomassie blue staining, -50-100 ng or 2.5 to 5 picomoles of Hh-N p is 
produced by one 35 mm dish of S2 cells containing the Cu ++ -inducible Hh construct 
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during 24 hours of induction with ImM CuS0 4 (13). This predicts - 12.5 to 25 nCi or 
2.75 x 10 4 to 5.5 x K? dpm of radioactivity would be incorporated into Hh-N p protein 
produced in our labeling experiment assuming it is cholesterol modified. Total 
incorporation of radioactivity into Hh-N p during the in vivo labeling experiment 

5 described above was measured at -5 x 10 4 dpm by excision and scintillation counting 
of an Hh-N p gel band), suggesting that other cellular components do not complete 
effectively as nucleophilic adducts in the in vivo autoprocessing reaction. Also 
consistent with a homogenous adduct, the mass of cholesterol is consistent with the mass 
previously measured by mass spectrometry of processed protein purified from cultured 

10 cells. A recent MALDI mass spectral analysis gave a mass of -430 daltons for the Hh- 
Np adduct, -9% larger than the mass of cholesterol (386.6). Detection of this 
modification required that Hh-N p be treated with CNBr/70% formic acid, i.e. full length 
Hh-N p could not be detected. The mass discrepancy noted above could be accounted for 
by the net addition of formic acid (45 daltons) during CNBr digestion. This reaction 

15 could involve the addition of H 2 0 across the 5,6 double bond of cholesterol, a common 
reaction of secondary alkenes in strong acids [R.T. Morrison, R.N. Boyd, Organic 
Chemistry (Allyn and Bacon, Boston, ed.3rd, 1973)], followed by esterification of 
formate via this newly formed alcohol [B.I: Cohen, G.S. Tint, T.Kuramoto, E.H. 
Mosbach, Steroids 25, 365-378, 1975. To test whether the sterol backbone could be 

20 modified by the CNBr treatment, a positively charged cholesterol derivative (3P(N- 
(N'jN'-dimethylamino) ethanecarbamoyl)-cholesterol, Sigma) detectable by MALDI was 
examined. It was found that incubation of this sterol derivative in 70% formic acid alone 
resulted in the addition of 45 mass units to the sterol (13), a mass consistent with the net 
addition of a formic acid molecule). These in vitro and in vivo results show that the Hh-C 

25 processing domain functions as a cholesterol transferase; as a result of this activity, a 
cholesterol adduct is attached via an ester linkage to the COOH-terminus of the NH 2 - 
terminal signaling domain of the Hh protein. 

To test whether processing of vertebrate hedgehog proteins results in the incorporation 
of cholesterol as a covalent adduct to the signaling domain, cultured green monkey 
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kidney cells (COS-7) were metabolically labeled with [ 3 H]cholesterol and transfected 
with expression constructs containing (i) the full length murine Sonic hedgehog (Shh) 
open reading frame, leading to production of an autocatalytically processed signaling 
domain (Shh-N J or (ii) Shh coding sequences precisely truncated at the site of cleavage, 
5 thus producing an unprocessed amino terminal signaling domain (Shh-N) (COS-7 ceils 
grown at 37°C in DMEM supplemented with 10% fetal calf serum were plated at -35% 
confluence onto two 35 mm dishes in 1 ml of Optimem media (Gibco) containing 1.5% 
fetal bovine sera and 25/iCi of [ 3 H]cholesterol, giving ~40^g/ml as the final concentra- 
tion of cholesterol with a specific activity of 2Ci/mmol (labeling medium). After 24 
10 hours the labeling medium was removed and the cells were transfected for 6 hours with 
Shh or Shh-N expression constructs using lipofectamine (Gibco) and serum-free DMEM 
media. After transfection, 1 ml of fresh labeling medium was added to each dish and the 
cells were incubated for 36 hours at 37°C. The cells were then harvested without 
washing, lysed on the plate with Tris buffered saline plus 1% Triton X-100 and the total 
1 5 cell proteins were precipitated with acetone, washed and analyzed as described above for 
the S2 cell proteins). Cells expressing the full length construct contained a prominent 
radio-labeled species migrating at -19 kD, suggesting that cholesterol is covalently 
added to Shh-N p (Fig. 25D). This band was not present in cultures expressing the 
truncated Shh-N protein (Fig. 25D), indicating that the incorporation of [ 3 H]cholesterol 
20 is dependent on the presence of the Shh processing domain. These data strongly suggest 
that the ability to attach cholesterol as a covalent adduct during autocatalytic processing 
and cleavage is a universal property of Hh proteins. Several other protein species in 
addition to the Shh amino terminal domain also appeared to incorporate cholesterol in 
cells transfected with either construct, suggesting that covalent modification by 
25 cholesterol extends to proteins beyond the Hh family. This possibility is consistent with 
the recently reported occurrence of several sequences homologous to the Hh processing 
domain in association with amino terminal sequences distinct from hedgehog. 
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EXAMPLE 20 

An experimental model for holoprosencephaly derives from the occurrence of epidemics 
of congenital craniofacial malformations among newborn Iambs on sheep ranches in 
several National Forests of the western United States. The most dramatically affected 

5 lambs showed severe holoprosencephaly, including true cyclopia and other craniofacial 
malformations characteristic of holoprosencephaly. The occurrence of these defects was 
traced to grazing by pregnant ewes on the range plant Veratrum californicum. The 
compounds responsible were identified as a family of steroidal alkaloids; the structures 
of two of these, cyclopamine and jervine, are shown as compared to cholesterol in Figure 

10 33. In Figure 33, sterols were extracted and analyzed by HPLC from COS7 cells 
metabolically labelled with [ 3 H] -mevalonic acid in the presence or absence of jervine, 
a teratogenic plant steroidal alkaloid. In the presence of 28mM jervine, radiolabeled 
cholesterol levels were reduced and another radiolabeled sterol was found to accumu- 
late. On the basis of its retention time in this reverse phase HPLC method, this abnormal 

15 sterol is tentatively identified as zymosterol, an intermediate in the cholesterol 
biosynthetic pathway. 

Given the structural similarities of these compounds to cholesterol and the similar 
teratogenic effects of cholesterol synthesis inhibitors upon the offspring of pregnant rats, 
a reasonable mechanism to consider for the effects of these plant sterol derivatives was 
20 the inhibition of cholesterol biosynthesis. Accordingly, COS7 cultured cells treated with 
jervine were tested for defects in cholesterol biosynthesis by labelling with 
[3H]-mevalonic acid and then extracting and analyzing radiolabeled, non-saponifiable 
lipids. 

Metabolic labeling and sterol analysis was essentially as described (Popjak et al. J. Biol 
25 Chen 264: 630-6238.1989; Rilling etal. 1993 Arch. Biochem. Biophys. 301: 210-215.), 
with minor modifications. Briefly, COS-7 cells were plated at -35% confluence into two 
60 mm dishes at 37 °C in 4 ml each of Dulbecco's modified Eagle's medium (DMEM) 
supplemented with 10% fetal bovine serum (FBS). After 24 hr of growth the medium in 
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each dish was replaced with 2 mi fresh medium with 10% FBS; [ 3 H]-mevalonic acid 
(NEN #NET 176) brought to a specific activity of 0.8 Ci/mmol in a 1% solution of 
bovine serum albumin was added to this medium to a final concentration of 20mM. At 
this time, one dish received 6ml of a 4 mg/ml solution of jervine in ethanol (final 

5 concentration 28 mM jervine), and the other received 6 ml of ethanol. After 24 hr further 
incubation, cells were washed in PBS, extracted with methanol, and 1 M potassium 
hydroxide (KOH) added to 10%. Following a three hour incubation at 60°C, the 
methanol/KOH mixture was extracted with diethyl ether, the extract dried down, 
resuspended in isopropanol, and subjected to reverse phase HPLC analysis by the method 

10 of Rodriguez and Parks (Methods in Enzymology 111: 37-5 1 1985). 

Treated cells synthesized reduced levels of cholesterol and accumulated increased levels 
of another sterol that we have provisionally identified as the cholesterol precursor, 
zymosterol. The natural product jervine at these concentrations thus inhibits cholesterol 
biosynthesis in cultured cells in much the same manner as the synthetic drugs discussed 
1 5 above, although the specific enzyme(s) affected appear to differ. Given the similarities 
in their teratogenic effects, this inhibition seems likely to underlie the teratogenic effects 
of both the synthetic and natural compounds. 
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Proccjn expression jmd purification Drosophila melanogasurHh orcein in which most of the 
amino-tcnriinai signaling domain and signal sequence have been replaced by a hexa-histidine tae 
(His 6 Hh-C2i) was expressed as previously described (Porter ei ah, 1 995). SeMet His fi Hh-0< 
was prepared by expression in £. cod strain BS34 (DE3) pLysS, a methionine auxotroph, and " 
growth in minimal media as previously described (Leahy et aj„ 1 994). This His 6 -tasecd protein 
was purified en a Ni**-NTA agarose column and autocleavage stimulated by addition of 50 mM 
DTT. After removal of the DTT by dialysis, the cleaved protein was passed over a Ni—NTA 
agarose column and the Hh carboxy-terminal domain, Hh-C^, coUecied in the column run 
through. Hh-Cy was subjected to limited proteolysis by overnight incubation whh i:500 (w:w) 
subtilisin (Bcehringer Mannheim) at 4°C. A proceasc-subk fragment of approximately 1 7 IcDa, 
Hh-C| 7 , was identified by SDS-PAGE and purified by anion-exchange chromatography utiiizina a 
Mono-Q column (Pharmacia). The amino- and carboxy-tenninaj residues of Hh-Cf 7 were 
determined to be Cys-258 and Ser-408, respectively, by mass spectral analysis of cyanogen 
brcmide-cie2ved fragments. Mass spectral analysis was performed a>: previously described (Poner 
ecaL, L996a). 

Crystallization: Crystals were grown from hanging drops by the method cf vapor diffusion 
(IVlodawerei ai., 1975). 6 uj cf a I A mg/m! solution of Hh-C| 7 in 1.4 mM 0-mcrcaptcetfcanol 
were mixed wiih 2 ui cf i 1 : ! dilution cf reser/cir solution (20% PEG .7350, HO mM ammonium 
sulfate, and 10 rruM sodium cacodylate, pH 5.2) with distilled water and equilibrate ever the 
reservoir solution. Crystals typically grew co a final sin- cf 0.2 mm x 0.2 mm x 0. 1 mm over 3-7 
days. Crystals are in space group 12 1 3 wjth unit eel! dimension a=b=crrj 01.54 A. 

Data collection and processing; All data were collected from crystals soaked in mother liquor made 
10% (w/v) ethylene glycol and flash frozen in a gaseous nitrogen stream at -180°C MAD data 
were collected at four wavelengths from a single SeMet crystaJ at beamline X-4A of the National 
Synchrotron Light Source at Brookhaven National Laboratory. Data were collected using Fuji 
HR-V phosphor-imaging plates and digitized using a Fuji BA-3000 scanner. 2° oscillations at <j> 
and $+180° were collected with no overlap for each oscillation range at each wavelength. All 
diffraction images were processed u.sing the program DENZO and scaled with the program 
SCALEPACK (Orwir.owski and Minor/1997). and <i-> were used for MAD phase 
determination and partially recorded reflexions were used in ail cases. Diffraction data xrem 
different wavelengths were scaled with WVLSCL, and values for F A and optimal f and f ' were 
calculated withMADLSQ (Hendrickson, 1991). Data collection statistics are shown in Table 1. 
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Sffliccure tfctcrrnination: Three selenium sites were deduced from r A amolitudcs using both the 
program SKELXS (Sheidrick, 1991) and Patterson methods. MaD phase determinations were 
nude with the program MLPHaRE (Collaborative Computational Project 199^; Rzmskrishnan 
and Biou. 1997), and solvent-flattening and tosrogram-matching were performed with the proeram 
DM (Collaborative Computational Project, 1994). An atomic mods! centring of Cys-258 co Tyr- 
401 was readiiy built into electron density maps computed with MAD^enved chases for 
reflections in the range 20.0-2.0 A using the program "O" (Jones et al., 1991). One round ox 
simulated annealing and several rounds cf Powell minimization usir.s X-PLOR (3run£cr, 1992) 
alternated with model building with "O" yielded the current model of Hh-Ci; consisting of 145 
residues, Cys-253 to Ala-402, and 126 water molecules. The model was refined using the data 
collected at 0.SS19 A. One molecule is present in the asymmetric unit, and the solvent content is 
approximately 59%. Ai! backbone torsion angles are within energetically acceptable reeiens. No 
electron densiry was observed for residues 403 to 408, but additional electron density was 
observed near the thiol group c: Cys-258. As the cr/rtalliza-ion buffer ccn:dned cacccvi:c acid, 
both AsO(CH.:)2 or an As arcm wen: modeled in this density, but neither the cr/scailosraphic R- 
zacccr nc: the free p^factor improved with these atoms added to the refinement and no atoms have 
been inciuded in this region in the final atomic model. 

each mutated la alanine (K329A, T326A. and D203A, respectively) and Leu-409 was mutated to a 
stop coder. (HisgHh-Ci?) by the method of recombinant circle PCR (iones and wlnistorfer, 
1992). Kis 6 -C2gged proteins containing residues S3-471 of Drosophila Kh protein with residues 
89-254 deleted and with the mutated residues were expressed in E. cnti and purified to near 
homogeneity as previously described (Porter et al. T 1995). Autocleavage activity cf the mutant 
proteins was assessed in 15 jil reactions by incubating 1 pg of protein in 150 mM NaCJ t 100 mM 
Tris-HCl (pH 7.4). 0.05% Triton X-100, 1.25 mM P-mercaptoethanol, and 2.5% glycerol with 
either 50 mM DTT or 350 uM cholesterol/ 1 mM DTT for 6 hours at 30°C The cleavage products 
were then fractionated by SDS-PAGE and detected by Coomassie Brilliant Blue staining. The 
activity cf D303A and Kis<sHh-C|7 were also zsscssad by incubating 1 of protein with 46 
[ 5 H]cholesteroi (1 1.6 Ci/mmolc)/l mM DTT for 6 hours at 30°C The sroteins were then 
subjected to SDS-PAGE and labeled proteins detected by autoradiography. 

Database marching: apd sequence ajignmenj: Screening of the non-redundanc protein sequence 
database at the National Center for Biotechnology Information (NTH) was performed using the 
BLASTPGP program, which is an enhanced version of BLAST that produces gapped alignments 
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(Allschul and Gish, 1996). Addicional searches were performed using the PSI-BLAST (Position- 
Specific Iterative BLAST) program, which constructs position-specific weight matrices from the 
BLASTPGP oucput and employs them for subsequent iterations of database screening using a 
modification of the 3LAST statistics (Altschul et aJ M in press). Alignments of multiple protein 
sequences were constructed using the CLUSTALW program (Thompson e: aU 1994) or the 
MACAW program (SchuJer et aL 1991). 
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Domain Identification and Structure Determinalion 

Drosophila n;dann K as:er Hh in which the signal sequence and most of (he smi no-terminal 
signaling domain have been replaced by a hexahistidine tag was expressed in L cnli as previously 
described (Porte: e: al., 1995). Following purificaiicn with Ni"-NTA agarose, this protein 
cleaves itself in viuo in the presence of either DTT or cholesterol to liberate the 25 kDa Hh-C 
fragment (Hh-C 2 <, residues Cys-258 to Asp-471). Hh-C 25 prepared by this method was found to 
be poorly soluble in the absence of detergents and susceptible to furier proteolytic breakdown 
when concentrated to 1 mg/ml or greater. Treatment of Hh-C 2 < subtiiism, however, resulted 
in a protege-stable fragment of -17 kDa molecular weight (Hh-C;:) with improved solubility. 
Mass sMcrrcmeiric analysis cf cyanogen bromide cleavage fragments of Hh-C 17 showed it to 
consist of residues Cys-253 tc Ser-408 (data net shown). Ali residues absolutely conserved in 
Hh-C homcicgues (Porter ec ah, 1996a), including the nematode sequences, are contained in Kh- 
C- t 7. To determine i: Hh-C 17 refined nutoprccessing activity, a mutant version of Ki.vtagged Hh- 
C containing a termination codon at residue position 409 (Hi^Hh-C ;7 ) was expressed and assayed 
for autocleavage in the presence of DTT and cholesterol. As shown ir. Figure 1 HiSoHh-C I7 is 
capable of cleaving itscir in the presence c: DTT but no: cholesterol. :r.c:caring ma: HiScHh-C t7 is 
able to form the thiccster intermediate (see Figure i A) but that some perden of the carbcxy- 
terminal 63 residues cf Hh-C 2 < (Leu-409 to Asp-471) is required for cholesterol transfer. 

Crysiais of Hh-C ; 7 mac ciffrac:ed to a: leas: 1.9 A Bragg spacing: were readily produced 
from both native and selenomemionyl-suhstuucea (ScMet) pmcem. The crystal structure or Hh- 
Cn was determined by the method of muiiiwaveicr.gth anomalous diffraction (MAD) using SeMe: 
crystals (Headricitson ec uL !99C; Hencricfcser., 1931). High cuaii^y experiment! electron 
density maps allowed construction cf an atomic model for Hh-C -.7 residues Cys-253 to Ala-402 
that readily refined to low R-factor with good stereochemistry. Final refinement and 
stereochemical silastics axe summarized in Tabic 1. 



Description of Hh-C L 7 Structure 

Hh-C 17 possesses an ail-f: structure tha: is roughly disk-shaped with a diameter c: -ji k 
and width cf -20 A. The amino and carboxy termini emerge from the same surface or Hh-C; -6 
A apart- A ribbon drawing and topology diagram of the Kh-C i7 s:rucmr= are shown in Figure j. 
An unexcecied fcarjre of the Hh-C 17 structure is the presence of two homologous subdomains 
related by a pseudo-twofold axis of symmetry (Figures 4A and 43). The subdomains adopt an 
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irregular fold characterized by three extended fi-hsuxpin loops and are intimately associated, burying 
1372 A 1 of surface area a: a hydrophobic interface such that a single hydrophobic cere exists for 
the entire Hh-C,; molecule. The topology of the Hh-C, 7 subdomains matches that of snake toxins 
such as cardictcxin VIM (Rees et ai., 1990) and c-bungarotoxin (Love and Stroud, 19S6). but the 
toxin and Kh-Q- structures do net superimpose well and these structures do not seem otherwise 
related. As discussed below, the full Hh-C 17 fold can be detected in the self-splicing region of 
luteins (Duar. et al., 1997), and the evidence for a divergent evolutionary relationship in this case is 

strong. . . ... 

Desoit; a low level of sequence conservation, the two Drosophiia Hh-C 17 subdomains are 

superirncosabie with an r.m.s. deviation in a-carbou positions of 1.38 A. and several notable 

structural features, including p-bulges and specific jj-turr, types, are conserved between the 

subdomains (Figure 4C). A structure-based alignment of the Drosophiia Hh-Cn subdornair. 

secuences is shown in Figure 4D. While 8 out of 50 amino-acid residues ( 1 6%) in this alignment 

are conserved, none of these 8 residues is absolutely conserved in both subdomains of all Hh-C 

homoicgues. A characteristic pattern of conserved amino-acid types, mostly hydrophobic 

residues, is discernible in an aligrumen: of these homologues, however. 

The levei cf structural similarity between the two Hh-C < 7 subdomains suggests that Kh- 
C17 could hav: arisen by tandem duplication of a primordial gene. The duplicatad sequences do 
not. however, correspond directly :o the compact subdomains observed in the Hh-Cn structure. 
As can be seer. \r. Figures 42 and ^C. the Hh-C 17 subecmams have exchanged homologo-is loo? 
redens. Examination of Figure 4B shews how the loop exchange in Hh-Cn could be achieved by 
a simple pivot cf the ioops about a single Hex point The structurally cohesive subdomair.s cf Kh- 
C| 7 are thus aasaics composed of elements from both units cf the tandem sequence duplication. 
To illustrate, if the three successive loops in each Hh-C 17 subdornair. are labeled 1-2-3 ar.d AI-A2- 
A3-B 1-B2-33 in the duplicated molecule prior to loop swapping, then the- exchange of the third 
loop between subdomains can be represented as A1-A2-(A3-B l-32)-33 where the strucrurally 
disrinc: subdomains are composed of loops either inside or outside of the parentheses (see Figure 
4B). We note that duplication coupled with an interdomain structural exchange such as appears to 
have occurred in Hh-C] 7 provides a mechanism to generate permutations in the order in which 
specific structural elements occur in '.he amino-acid sequence. Such pemnutations have been notcc 
in other systems including saposin hcmologues (Ponting and Russell. 1995) and bacterial 
elucanascs (Kcineinann and Hahn, 1995). 

Trie exchange cf domains cr elements cf secondary strucrure has been observed in several 
proteins ar.d is believed to resuh in a more stable association of subunics in mulddomain proteins 
(Benneti et ai., 1995). Exchange cf structural regions has principally been observed between 
independent polypeptide chains within homodimers, but the iac opcron repressor and hocnologues 
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also appear co represent a case cf exchange between dupiicared domains within a single polypeptide 
chain (Schumacher et at., 1994; Lewis et aL, 1996). 



Active Site Residues 

The ammo-terminal residue of Hh-Cn, Cys-258, is involved in both the thioestcr 
formation and cholesterol transfer steps of Hh autoprocessing (see Figure 1A). Amino-acid sice 
chains participating directly in Hh autoprocessing chemistry will most likely possess polar groups, 
and the only such residues near Cys-258 in the Hh-Cn structure are His-329, Thr-326, and Asp- 
303. The arrangement of these three amino acids in rclaricn co Cys-25S is shown in Figures 5 A 
and 53. Kis-329 and Thr-326 arc absolutely conserved in all Kh-C hcmologues, and the side 
chains of both of chese residues are within hydrogen bonding distance of the c-amino group of 
Cys-25S in the Hh-Cn structure Asp- 3 03 is invariably aspartic acid or histidine in Kh-C 
domains . and the sice chain cf Asp-303 is exposed to solvent 4.2-4.5 A away from the Cys-255 
thiol group. Significant structural rearrangemencs would appear necessary for additional residues 
inHh-C|7 to participate directly in Hh autoprocessing. Cys-258, Kis-329, and Asp-2S2 do no: 
form a serine prccease-lixe catalytic triad as had been proposed (Lee ct aL, 199-; Porter et al., 
1996a). 

Tc assess the involvement of Kis-329, Thr-325, and Asp-303 in Kh autoprocessing, each 
of these residues was mutated to alanine within the context of the fill-length His^Hh-C^ protein, 
and the mutant proteins were expressed and assayed for Hh autoprecESsing activity. The 
autccleaving activity of the mutant proteins in the presence of high concentration:;, of DTT was usee 
as an assay for thioestcr formation, the f:rsi step in the Hh autoprocessing reaction, while the 
autccleaving activity in the presence cf cholesterol was used to assay for cholesterol transfer, the 
second step in the autoprocessing reaction. The results of these assays are shown in Figure DC. 
Kis-329 is known from earlier experiments to be essential for Hh autoprocessing activity (Lee c: 
al., 1994), and the Kis-329 to alanine mutant (H329A) was inactive in both the DTT- and 
choicstercl-stimulated reactions. Tne Thr-326 to alanine mutant (T326A) also showed greatly 
reduced activicy in both assays. By contrast, the Asp-303 to alanine mutant (D303A) was active in 
the DTT-stLmulated reaction but inactive in the cholesterol-stimulated reaction. 

Tne loss or dramatic reduction of autocleaving activity in the presence of both DTT and 
cholesterol for H329A and T326 A implicates both Kis-329 and Thr-326 in formation of the 
internal utioestcr during Hh autoprocessing. The interaction of the side chains of both of these 
residues with the a-amino group of Cys-258 t a component of the cleaved peptide bond, strongly 
implies a direct role for these residues in thiocstcr formation. Possible roles for Kis-329 during 
thioester formation include stabilization of negative charge cn the carbonyl oxygen of Gly-257, 
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donation of a proton to the free c- ami no group of Cys-258, and maintenance cf an appropriate 
orientation of reaction components through polar interactions. His-329 may aJso deprotonate the 
thiol group of Cys-258 prior to thioester formation, but if this is the case some rearrangement of 
Cys-25S relative to its position in the Hh-C 17 crystal structure would be required to bring the thiol 
eroup of Cys-253 into proximity with His-329. As the pK a o: the thiol group in free cysteine is 
S.5, a base may not be needed to catalyze thiol decatenation. Possible roles rcr Thr-326 in 
thioester formation seem more limited. The high pK 3 of a threonine hydroxy] group (>15) makes 
Thr-326 an unlikely candidate for proton trailers, suggesting that this residue is needed to form 
polar interactions that stabilize reactive conformadons within the Hh protein. 

The activity cf the D303A mutant in DTT- but not cholesierol-stirnulated autoprocessing 
shows that Asp-305 is not needed for thicester formation bur is required for cholesterol transfer. 
The nczauveiy-charged aspanie acid residue seems unlikely to be involved in binding a 
hydrophobic cholesterol molecule. A role in activating the cholestsrci molecule for aucieophiiic 
attack cf the thioester appears mere plausible. For cholesterol 10 become an effective nuclccphiic, 
the 3jJ-hydroxyl group must become deprctonated, and Asp-303 is a good candidate for the general 
base that catalyzes this deprctcnaden. Substitution of Asp-303 wim histidinc in Hh-C hemoicgues 
is consistent with this hypothesis as histicir.e is aiso capable of functioning as a genera! base. 

A5 indicated by the inactivity cf Hh-C 17 in cholesterol transfer assays, residues in the 63 
amino acids removed from the Hh-C^f carbcxy terminus are also involved in cholesterol transfer. 
The -—dmiry c; the car=c>;y terminus c: I-ih-C;: to the active site implies a d::zz: :z:t for these 
residues in cholesterol binding or activation. The decreased solubility cf Kh-C^ relative to Eh- 
C:7 suggests that the carbcxy-:emiina: 63 residues czrih-C^ may possess an exposed 
hydrophobic region thai ccuid serve as a cholesterol binding site. 

Relationship between Hh-C l7 and Self-Splicing Proteins 

An earlier analysis identified a 36 amino acid conserved motif in the ami no- terminal regions 
of Hh-Chomoiogues and intcins (Kocnin, 1995). A greatly expanded database of Hh-C and 
intein sequences coupled with recent enhancements of the 3 LA ST method for database searching 
enabled extension of the detectable region cf sequence similarity to the arnino-termJnai -100 amino 
acids of Hh-C and intein sequences (p -iO'MCH). The improved methods for database searching 
include statistical analysis of gaooed alignments and iterative database scanning with pesiuen- 
specific matrices derived from previous BLAST outputs (Altschui et ai., in press). When a 
database search was initiated with any of the Hh-C sequences or with most of the intern sequences, 
members of the respective second protein family were the only additional sequences retrieved from 
the database at a statistically significant level. 
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Solution 01* chc Hh-C [7 crystal structure showed the expanded region of Hh-C/intein 
sequence homology to terminate halfway through one of the subdomains in the turn region of an 
exposed leep between 3 scrands 3b and 4b (sec Figure 3 A). This observation, coupled with the 
presence cf characteristic endonuclease motifs in intein sequences shortly after the end of the 
detectable Hh-C/intcir. homology suggested char the intcin endonuclease domain had been insened 
into' the p3b-p4b leep of an Hh-C^-like structure. This hypothesis focused the search for 
resumption of the Kh-C/inrein sequence similarity in intcin sequences likely to follow the 
endonuclease region. The recently determined crystal structure of the Pl-Scef intein indeed shows 
the insertion of the endonuclease region of the intein in the P3b-£4b loop of an Hh-C| 7 -[ike 
structure and indicates thai the region of the intein sequence in which the similarity to Hh-C| 7 muse 
resume is Immediately amino- terminal ro the second extcin (Duan e: aL, 1997). An aiisnmem of 
Hh-Ci? and intcin sequences is shown in Figure 6A. A fully structure-based alignment of the Kh- 
Ci? a.nd intein sequences awaits direct comparison of the atomic coordinates of Hh-C !7 and PI- 
Seel. 

As can be seen in Figure 6A, aside from sites with conserved hydrophobic character the 
only residues absolutely or nearly absolutely conserved between Kh-C homoiogucs and interns arc 
these identified in the active site of Kh-C i7 and shown to be important for thioester formation by 
site-directed mutagenesis. The aminc-terminal cysteine residue directly involved in thioester 
formation in Hh-C homolcgucs is replaced by serine in some interns, and these inteins form an 
ester rather than a thioestcr intermediate. Tne yeast HO endonuclease. which lacks an ami no- 
terminal serine or cysteine residue, decs net have self-splicing ac;ivi:y (?-rier e: al.. 1997), and the 
enly incein hemcicgue in which this residue is replaced by =ianine (KJbA rrccrin h— oicsue from. 
Metkcr.ocaccus jcrv:cschis) is suspected to be inactive as weii. 

The oniy residue absolutely conserved between Kh-C homoiogues and inteins is a histicine 
corresponding to His-329 in Drosophila Hh-C Trie presence of Kis-329 in the active site of 
Drosophila Hh-C and the loss of thioester formation activity when Kis-329 is mutated strongly 
imply that this histicine is conserved because it performs a vital role in thioester formation and that : 
it functions similarly in inteins and Kh-C homoiogues. The only other residue conserved in the 
active site of Kh-C homologies and shown by mutagenesis to be required for efficient thioester 
formation, Thr-326, is also extremely conserved in intein sequences. Of the 39 intcin sequences in 
the database at the time of our comparison, 34 sequences contain a threonine at a homologous 
position to Thr-326, while :!iree inteins have serine, and one each have asparagtne cr glutamic acid 
at this position. The high level c: conservation of threonine al this active site position and its 
substitution with similar amino acids suggests a conserved role for this threonine in inteins and 
Kh-C homolcgues. A conserved residue homologous to Asp -3 03, also found in the Hh-C| 7 active 
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site, is not found in intern sequences, consistent with its roic in cholesrerol activation rather than 
thioester formation. 

As expected from the sequence homology, the structures of the seif-splicing region of the 
Pl-Sccl imsin (Duan et a!.. 1997) and Hh-C| 7 are clearly homologous. Although not previously 
noted, the self-splicing region of Pl-Scel contains homologous sub-domains related by 
pseudosymmetry. Tnc PI-SccI subcomains arc homologous to the Hh-Q 7 subdomains and 
possess the same loop exchange observed in Hh-C, 7 . However, these features are obscured by 
insertion of endonudease-associated sequences. In addition to insertion of the core endonuclease 
domain in the region homologous to the |33b-JJ4b loop, the Pl-Scel intein contains an additional 
insertion of amino acids relative to the Kh-C n structure. The site of this insertion occurs in the 
mm between p strands lb and 2b in the Hh-C| 7 structure (see Figure 3 A), and this inserted region 
is believed tc be involved in aiding DNA recognition by the Pl-Scel intein (Duan et zL, 1997). 
Figure 63 shows a stereodiasram of the Hh-C| 7 structure depicted in the same orientation as the 
Pl-Scel imein structure in Duan, ct al. (1997) with the sites of the endonuc lease-associated 
insertions indicated. 

The conservation of structure, sequence, and cleavage mechanism between Hh-C 
hooolegues and the imein regie ns of self-splicing proteins firmly establishes the divergence of 
these two protein families from a common precursor. Figure 7 shews a plausible evolutionary 
scenario for the development of the Hh-C and intein protein families from a primordial domain of 
w=taw= rscdor.. The Hh-C - : 7 module is sufficient csiy for the initial replacement cf a peptide . 
bond with a thioester or ester in both the Ha autoprocessing and self-splicing reactions. In beta 
sralsin families, residues carbcxy terminal to the Hh-C l7 module are needed for selecting or 
contributing the second nucicophile that resolves the initial ester/thioester and dstemunes me 
products of the overall reaction. The loss of detectable sequence similarity in the region of the C. 
elegant Hh-C homologues following the Hh-C, 7 module (R. Mann, personal communication) 
raises the possibility that these residues may transfer a molecule other than cholesterol The 
ongoing expansion of sequence databases provides the prospect of additional Hh-Q 7 modules 
being discovered that initiate novel splicing or transfer reactions by formation of ester or thioester 
intermediates. 



WO 98/30576 PCT/US97/15753 

136 



Table Legend 

Tabic I. Statistics for data collection, phase determination and refinement 
(A) R S ym ^ completeness values were calculated considering Bijvoets equivalent. Values in 
parentheses for <Ucb> are for the highest resolution shell (1.98-1.9 A). R sym = 100 x Ih^i Hj(h) 
- <I(h>l / Ihl; Ii(h). (B) r.nxs. (AlFI) / r.m.s. (IFi) where AF is the Bijvoet difference at one 
wavelength (values on the diagonal) or the dispersive difference between two wavelengths (values 
off the diagonal). Also sho^wn arc the anomalous components of the Se scattering factors as a 
function of wavelength as determined by MADLSQ (Hendrickson, 1991). (C) All data for which 
IFl>2cr were used in the refinement. A subset of the data ( 109b) was excluded from the refinement 
and used to calcuiare ihe free R-valuc (Eriinger, 1992). A final round of refinement including this 
data was performed to produce the final set of coordinates and cryscallcgraphic R-valuc. R- value = 
ZUFoI-DFcll / ilFoi- 
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Table I Statistics far data collection, phase determination and refinement 
(A) Data Collection Statistics (30.0 to 1.9 A) 



Wavelength 
(A) 


Reflections 
(N) 


Redundancy 


Completeness 
(%) 


Signal 
(<i/d>) 


(%) 


0.9919 


26,790 


10.3 


100.0 


19.9 (4.2) 


9.2 


0.9793 


26,791 


10.6 


100.0 


19.4 (3,5) 


9.9 


0.9791 


26,792 


10.4 


100.0 


18.8 (3.6) 


10.4 


0.9656 


26,792 


10.5 


100.0 


16.7 (3.5) 


10.3 



(B) MAD Structure Factor Ratios and Anomalous Scattering Factors 



Wavelength 
(A) 


0.9919 


0.9793 


0.9791 


0.9556 


f 

(e) 


01 

(e) 


0.9919 


0.041 


0.064 


0.059 


0.053 


-3.94 


0.51 


0.9793 




0.055 


0.049 


0.062 


-9.45 


3.23 


0.9791 






0.076 


0.058 


-3.05 


6.03 


0.9686 








0.063 


-4.15 


4.12 



(C) Refinement and stereochemical statistics 



R-vaiue 
free R-vaiue 
Average B (A 2 ) 
Rms deviations 
Bonds (A) 
Angles (°) 
B-values (A 2 ) 



0.213(F:>2cj, 6.0-1 .SA) 
0.275 (F^2c, 6.0-1 .9A) 0.2E3 (a 
21 .5 for protein, 39.6 for solvent 



0.222 (all F t 6.0-1. 9A) 
5.0-1 .SA) 



o.ooa 

1.97 

1.30/1.45 bonds/angles of main chain 
2.83/3.20 bonds/angles of side ciiains 
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The above disclosure generally describes the present invention. A more complete 
understanding can be obtained by reference to the following specific examples which are 
provided herein for purposes of illustration only and are not intended to limit the scope 
of the invention. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION; 

(i) APPLICANT: The Johns Hopkins University School of Medicine et 
(ii) TITLE OF INVENTION: NOVEL HEDGEHOG - DERIVED POLYPEPTIDES 
(iii) NUMBER OF SEQUENCES: 20 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & Richardson P.C. 

(B) STREET: 4225 Executive Square, Suite 1400 

(C) CITY: La Jolla 

(D) STATE: CA 

(E) COUNTRY: U.S.A. 

(F) ZIP: 92037 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS/MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 07-OCT-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/729,743 

(B) FILING. DATE: 07-OCT-1996 

(C) CLASSIFICATION: 

(vii) PRIOR- APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/---,--- 
<B) FILING DATE: 02-OCT-1997 
(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Eaile, Lisa A. 

(B) REGISTRATION NUMBER: 38,347 

(C) REFERENCE /DOCKET NUMBER: 07265/099WO1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619/678-5070 

(B) TELEFAX: 619/678-5099 



(2) INFORMATION FOR SEQ ID NO;l: 

(i) SEQUENCE -CHARACTERISTICS : 

(A) LENGTH: ~144 base' pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE; cDNA 



(ix) FEATURE: 

(A) NAME/ KEY; CDS 
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(B) LOCATION: 1..142 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GTG AAA CTG CGG GTG ACC GAG CCC TGG GAC GAA GAT GGC CAC CAC TCA 48 
Val Lys Leu Arg Val Thr Glu Pro Trp Asp Glu Asp Gly His His Ser 
! 5 10 15 

CAG GAG TCT CTG CAC TAC GAG GGC CGC GCA GTG GAC ATC ACC ACG TCT 96 
Gin Glu Ser Leu His Tyr Glu Gly Arg Ala Val Asp lie Thr Thr Ser 
20 25 30 

GAC CGC GAC CGC AGC AAG TAC GGC ATG CTG GCC CGC CTG GCG GTG G 142 
Asp Arg Asp Arg Ser Lys Tyr Gly Met Leu Ala Arg Leu Ala Val 
35 40 45 



AG 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 144 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE : cDNA 



144 



( ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..142 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

GTG AAG CTG CGG GTG ACC GAG GGC TGG GAC GAG GAC GGC CAC CAC TCA 48 
Val Lys Leu Arg Val Thr Glu Gly Trp Asp Glu Asp Gly His His Ser 
50 55 60 

GAG GAG TCC CTG CAT TAT GAG GGC CGC GCG GTG GAC ATC ACC ACA TCA 96 
Glu Glu Ser Leu His Tyr Glu Gly Arg Ala Val Asp He Thr Thr Ser 
65 70 75 

GAC CGC GAC CGC AAT AAG TAT GGA CTG CTG GCG CGC TTG GCA GTG G "142 
Asp Arg Asp Arg Asn Lys Tyr Gly Leu Leu Ala Arg Leu Ala Val 
80 85 90 

AG 144 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

He Ser Ser His Val .His Gly Cys Phe Thr Pro Glu Ser Thr 
15 10 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Ser He Ser His Met His Gly Cys Phe Thr Pro Glu Ser Thr 
1 5 10 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 
(3). .TYPE.: .amino acid 
<C) STRANDEDNESS : not relevant 
(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Val Ala Ala Lys Ser Gly Gly Cys Phe Pro Gly Ser Ala Thr 
15 10 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE : protein 



(xi) SEQUENCE"DESCRIPTION: "SEQ "ID NO: 6: 

-Val Ala Ala Lys Ser Asd Gly . Cys Phe Pro Gly Ser Ala Thr 
1 5 10 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
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(C) STRAND EDNESS : not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE : protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Val Ala Ala Lys Ser Gly Gly Cys Phe Pro Gly Ser Ala Leu 
15 10 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Val Ala Ala Lys Ser Gly Gly Cys Phe Pro Gly Ser Gly Thr 
15 10 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Val Ala Ala Lys Ser Gly Gly Cys Phe Pro Ala Gly Ala Arg Thr 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 10 : 
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Val Ala Ala Lys Thr Gly Gly Cys Phe Pro Ala Gly Ala Gin 
15 10 

(2) INFORMATION FOR SEQ ID NO; 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE : protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Val Ala Ala Lys Thr Gly Gly Cys Phe Pro Gly Glu Ala Leu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Leu Gly Val Ara Ser Gly Gly Cys Phe Pro Gly Thr Ala Met 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 13 : 

Leu Ala Val Arg Ala Gly Gly Cys Phe Pro Gly Asn Ala Thr 
15 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 8 amino acids 
(3) TYPE: amino acid 
(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: both 
(ii) MOLECULE TYPE : protein 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

His Glv His Gly Cys Phe Thr Pro 
1 ' 5 

(2) INFORMATION FOR SEQ ID NO : 15 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

His Gly His Gly Cys Phe Thr Pro 
1 5 

(2) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE : protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Lys Ser Gly Gly Cys Phe Pro Gly 
1 5 



.(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 416 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Asp Val Arg Leu His Leu Lys Gin Phe Ala Leu Leu Cys Phe lie 
15 10 15 
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Ser Leu Leu Leu Thr Pro Cys Gly Leu Ala Cys Gly Pro Gly Arg Gly 
20 25 30 

Tyr Gly Lys Arg Arg His Pro Lys Lys Leu Thr Pro Leu Ala Tyr Lys 
35 40 45 

Gin Phe He Pro Asn Val Ala Glu Lys Thr Leu Gly Ala Ser Gly Lys 
50 55 60 

Tyr Glu Gly Lys He Thr Arg Asn Ser Glu Arg Phe Lys Glu Leu He 
65 70 75 80 

Pro Asn Tyr Asn Pro Asp He He Phe Lys Asp Glu Glu Asn Thr' Asn 
85 90 95 . 

Ala Asp Arg Leu Met Thr Lys Arg Cys Lys Asp Lys Leu Asn Ser Leu 
100 105 110 

Ala He Ser Val Met Asn His Trp Pro Gly Val Lys Leu Arg Val Thr 
115 120 125 

Glu Gly Trp Asp Glu Asp Gly His His Leu Glu Glu Ser Leu Kis Tyr 
130 135 140 

Glu Gly Arg Ala Val Asp He Thr Thr Ser Asp Arg Asp Lys Ser Lys 
145 150 155 160 

Tyr Gly Met Leu Ser Arg Leu Ala Val Glu Ala Gly Phe Asp Trp Val 
165 170 175 

Tyr Tyr Glu Ser Lvs Ala His He His Cys Ser Val Lys Ala Glu Asn 
180 * 185 190 

Ser Val Ala Ala Lys Ser Gly Gly Cys Phe Pro Gly Ser Gly Thr Val 
195 200 205 

Thr Leu Gly Asp Gly Thr Arg Lys Pro He Lys Asp Leu Lys Val Gly 
210 215 220 

Asp Arg Val Leu Ala Ala Asp Glu Lvs Gly Asn Val Leu He Ser Asp 
225 230 235 240 

Phe He Met Phe He Asp His Asp Pro Thr Thr Arg Arg Gin Phe He 
245 250 255 

Val He Glu Thr Ser Glu Pro Phe Thr Lys Leu Thr Leu Thr Ala Ala 
260 265 270 

His Leu Val Phe Val Gly Asn Ser Ser Ala Ala Ser Gly He Thr Ala 
275 280 285 

Thr Phe Ala Ser Asn Val Lys Pro Gly Asp Thr Val Leu Val Trp Glu 
290 295 300 

Asp Thr Cys Glu Ser Leu Lys Ser Val Thr Val Lys Arg He Tyr Thr 
305 310 315 320 

Glu Glu His Glu Gly Ser Phe Ala Pro Val Thr Ala His Gly Thr He 
325 330 335 

He Val Asp Gin Val Leu Ala Ser Cys Tyr Ala Val He Glu Asn His 
340 345 350 
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Lys Trp Ala His 
355 

Met Thr Trp Leu 
370 

Asp- Gly He His 
385 

Leu Leu Asp Arg 



Trp Ala Phe Ala 
360 

Phe Pro Ala Arg 
375 

Trp Tyr Ser Asn 
390 

Asp Ser Phe Kis 
405 
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Pro Val Arg Leu 

Glu Ser Asn Val 
380 

Met Leu Phe His 
395 

Pro Leu Gly He 
410 



Cys His Lys Leu 
365 

Asn Phe Gin Glu 



He Gly Ser Trp 
400 

Leu Kis Leu Ser 
415 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 418 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Arg Leu Leu Thr Arg Val Leu Leu Val Ser Leu Leu Thr Leu Ser 
15 10 15 

Leu Val Val Ser Gly Leu Ala Cys Gly Pro Gly Arg Gly Tyr Gly Arg 
20 25 30 

Arg Arg His Pro Lys Lys Leu Thr Pro Leu Ala Tyr Lys Gin Phe He 
35 40 45 

Pro Asn Val Ala Glu Lys Thr Leu Gly Ala Ser Gly Arg Tyr Glu Gly 
50 55 60 

Lys He Thr Arg Asn Ser Glu Arg Phe Lys Glu Leu Thr Pro Asn Tyr 
65 70 75 80 

Asn Pro Asp He He Phe Lys Asp Glu Glu Asn Thr Gly Ala Asp Arg 
85 90 95 

Leu Met Thr Gin Arg Cys Lys Aso Lys Leu Asn Ser Leu Ala He Ser 
100 105 110 

Val Met Asn Kis Trp Pro Gly Val Lys Leu Arg Val Thr Glu Gly Trp 
115 120 125 

Asp Glu Asp Gly His His Phe Glu Glu Ser Leu His Tyr Glu Gly Arg 
130 135 140 

Ala Val Asp He Thr Thr Ser Asp Arg Asp Lys Ser Lys Tyr Gly Thr 
145 150 155 160 

Leu Ser Arg Leu Ala Val Glu Ala Gly Phe Asp Trp Val Tyr Tyr Glu 
165 170 175 

Ser Lys Ala His He His Cys Ser Val Lys Ala Glu Asn Ser Val Ala 
180 185 190 

Ala Lys Ser Gly Gly Cys Phe Pro Gly Ser Ala Leu Val Ser Leu Gin 
195 200 205 
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Asp Gly Gly Gin Lys Ala Val Lys Asp Leu Asn Pro Gly Asp Lys Val 
210 215 220 

Leu Ala Ala Asp- Ser -Ala -Gly Asn -Leu Mai Phe -Ser Asp Phe lie Met 
225 230 235 240 

Phe Thr Asp Ara Asp Ser Thr Thr Arg Arg Val Phe Tyr Val He Glu 
245 250 255 

Thr Gin Glu Pro Val Glu Lys He Thr Leu Thr Ala Ala His Leu Leu 
260 265 270 

Phe Val Leu Asp Asn Ser Thr Glu Asp Leu His Thr Met Thr Ala Ala 
275 280 285 

Tyr Ala Ser Ser Val Arg Ala Gly Gin Lys Val Met Val Val Asp Asp 
290 295 300 

Ser Gly Gin Leu Lys Ser Val He Val Gin Arg He Tyr Thr Glu Glu 
305 310 315 320 

Gin Arg Gly Ser Phe Ala Pro Val Thr Ala His Gly Thr He Val Val 
325 330 335 

Asp Arg He Leu Ala Ser Cys Tyr Ala Val lie Glu Asp Gin Gly Leu 
340 345 350 

Ala His Leu Ala Phe Ala Pro Ala Arg Leu Tyr Tyr Tyr Val Ser Ser 
355 360 365 

Phe Leu Phe Pro Gin Asn Ser Ser Ser Arg Ser Asn Ala Thr Leu Gin 
370 375 380 

Gin Glu Gly Val His Trr> Tyr Ser Arg Leu .Leu Tyr Gin Met Gly Thr 
385 390 395 400 

Trp Leu Leu Asp Ser Asn Met Leu His -Pro -Leu Gly Met Ser Val Asn 
405 410 415 

Ser Ser 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42S amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Val Glu Met Leu Leu Leu Thr Arg He Leu Leu Val Gly Phe He 
1 5 10 15 

Cys Ala Leu Leu Val Ser Ser Gly Leu Thr Cys Gly Pro Gly Arg Gly 
20 25 30 
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He Gly His Arg Arg His Pro Lys Lys Leu Thr Pro Leu Ala Tyr Lys 
35 40 45 

Gin Phe He Pro Asn Val Ala Glu Lys Thr Leu Gly Ala Ser Gly Arg 
50 55 60 

Tyr Glu Giy Lys He Thr Arg Asn Ser Glu Arg Phe Lys Glu Leu He 
65 70 75 30 

Pro Asn Tyr Asn Pro Asp He He Phe Lys Asp Glu Glu Asn Thr Gly 
85 90 95 

Ala Asp Arg Leu Met Thr Cys Arg Cys Lys Asp Lys Leu Asn Ala Leu 
100 105 HO. 

Ala lie Ser Val Met Asn Cys Trp Pro Gly Val Met Leu Arg Val Thr 
115 120 125 

Glu Gly Trp Asp Glu Asp Gly His His Ser Lys Glu Ser Leu His Tyr 
130 135 140 

Glu Gly Arg Ala Val Asp lie Thr Thr Ser Asp Arg Asp Arg Ser Lys 
145 150 155 160 

Tvr Gly Met Leu Ala Arg Leu Ala Val Glu Ala Gly Phe Asp Trp Val 
165 170 175 

Tyr Tyr Glu Ser Lys Ala His He Cys Ser Val Lys Ala Glu Asn Ser 
180 185 190 

Val Ala Ala Lvs Ser Gly Gly Cys Phe Pro Gly Ser Ala Thr Val His 
195 " 200 205 

Leu Glu His Gly Gly Thr Lys Leu Val Lys Asp Leu Ser His Gly Asp 
210 215 220 

Arg Val Leu Ala Ala Asp Ala Asp Gly Arg Leu Leu Val Ser Asp Phe 
225 230 235 240 

Leu Leu Thr Phe Leu Asp Arg Met Asp Ser Ser Arg Lys Leu Phe Tyr 
245 250 255 

Val He Glu Thr Arg Gin Pro Arg Ala Arg Leu Leu Leu Thr Ala Ala 
260 265 270 

His Leu Leu Phe Val Ala Pro Gin His Asn Gin Ser Glu Ala Thr Gly 
27S 280 285 

Ser Thr Ser Gly Gin Ala Leu Phe Ala Ser Asn Val Lys Pro Gly Gin 
290 295 300 

Pro Val Val Val Leu Gly Glu Gly Gly Gin Gin Leu Leu Pro Ala Ser 
305 310 315 320 

Val His Ser Val Ser Leu Arg Glu Glu Ala Ser Gly Ala Tyr Ala Pro 
325 330 335 

Thr Thr Ala Cys Gly Thr He Leu He Asn Arg Val Leu Ala Ser Cys 
340 345 350 

Tyr Ala Val lie Glu Glu His Ser Trp Ala His Ala Ala Phe Ala Pro 
355 360 365 
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His Arg Leu Ala Gin Gly Leu Leu Ala Ala Leu Cys Pro Asp Gly Ala 
370 375 380 

lie Pro Thr Ala Ala Thr Thr Thr Thr Gly lie His Trp Tyr Ser Arg 
385 390 395 400 

Leu Leu Tyr Arg lie Gly Ser Trp Val Leu Asp Gly Asp Ala Leu His 
405 410 415 

Pro Leu Gly Met Val Ala Pro Ala Ser 
420 425 

(2) INFORMATION FOR SEQ ID NO; 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 437 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

Met Leu Leu Leu Leu Ala Arg Cys Phe Leu Val lie Leu Ala Ser Ser 
15 10 15 

Leu Leu Val Cys Pro Gly Leu Ala Cys Gly Pro Gly Arg Gly Phe Gly 
20 25 30 

Lys Arg Arg His Pro Lys Lys Leu Thr Pro Leu Ala Tyr Lys Gin Phe 
35 40 45 

lie Pro Asn Val Ala Glu Lys Thr Leu Gly Ala Ser Gly Arg Tyr Glu 
50 55 60 

Gly Lys lie Thr Arg Asn Ser Glu Arg Phe Lys Glu Leu Thr Pro Asn 
65 70 75 80 

Tyr Asn Pro Asp lie lie Phe Lys Asp Glu Glu Asn Thr Gly Ala Asp 
85 90 95 

Arg Leu Met Thr Gin Arg Cys Lys Asp Lys Leu Asn Ala Leu Ala lie 
100 105 110 

Ser Val Met Asn Gin Trp Pro Gly Val Lys Leu Arg Val Thr Glu Gly 
115 " 120 125 

Trp Asd Glu Asp Gly His His Ser Glu Glu Ser Leu His Tyr Glu Gly 
130 135 140 

Arg Ala Val Asp lie Thr Thr Ser Asp Arg Asp Arg Ser Lys Tyr Gly 
145 150 155 160 

Met Leu Ala Arg Leu Ala Val Glu Ala Gly Phe Asp Trp Val Tyr Tyr 
165 170 175 

Glu Ser Lys Ala His lie His Cys Ser Val Lys Ala Glu Asn Ser Val 
180 185 190 

Ala Ala Lys Ser Gly Gly Cys Phe Pro Gly Ser Ala Thr Val His Leu 
195 200 205 
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Glu Gin Gly Gly Thr Lys Leu Val Lys Asp Leu Arg Pro Gly Asp Arg 
210 215 220 

Val Leu Ala Ala Asp Asd Gin Gly Arg Leu Leu Tyr Ser Asp Phe Leu 
225 * 230 235 240 

Thr Phe Leu Asp Arg Asp Glu Gly Ala Lys Lys Val Phe Tyr Val lie 
245 250 255 

Glu Thr Leu Glu Pro Arg Glu Arg Leu Leu Leu Thr Ala Ala His Leu 
260 265 270 

Leu Phe Val Ala Pro His Asn Asp Ser Gly Pro Thr Pro Gly Pro Ser 
275 280 285 

Ala Leu Phe Ala Ser Arg Val Arg Pro Gly Gin Arg Val Tyr Val Val 
290 295 300 

Ala Glu Arg Gly Gly Asp Arg Arg Leu Leu Pro Ala Ala Val His Ser 
305 310 315 320 

Val Thr Leu Arg Glu Glu Glu Ala Gly Ala Tyr Ala Pro Leu Thr Ala 
325 330 335 

His Gly Thr lie Leu lie Asn Arg Val Leu Ala Ser Cys Tyr Ala Val 
340 345 350 

lie Glu Glu His Ser Trp Ala His Arg Ala Phe Ala Pro Phe Arg Leu 
355 360 365 

Ala His Ala Leu Leu Ala Ala Leu Ala Pro Ala Arg Thr Asp Gly Gly 
370 375 380 

Gly Gly Gly Ser lie Pro Ala Ala Gin Ser Ala Thr Glu Ala Arg Gly 
385 390 395 400 

Ala Glu Pro Thr Ala Gly lie His Trp Tyr Ser Gin Leu Leu Tyr His 
405 410 415 

lie Gly Thr Trp Leu Leu Asp Ser Glu Thr Met His Pro Leu Gly Met 
420 425 430 

Ala Val Lys Ser Ser 
435 
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CLAIMS 

1 9 A substantially pure polypeptide characterized by having an amino acid 

sequence of a hedgehog polypeptide or a fragment derived from amino 
terminal amino acids of a hedgehog polypeptide, wherein the polypeptide 
5 or fragment thereof comprises a sterol moiety. 

~2. The polypeptide of claim 1, wherein the fragment is an extracellular 

hedgehog polypeptide fragment. 

3. The polypeptide of claim 2, wherein the fragment has at its carboxy 
terminus, a G1CF cleavage site specifically recognized by a proteolytic 

10 activity of the carboxy terminal fragment of a native hedgehog polypep- 

tide 

4. The polypeptide of claim 1 , wherein the sterol is cholesterol. 

5. The polypeptide of claim 1 , wherein the hedgehog polypeptide is selected 
from the group consisting of Drosophila, Zebrafish, Xenopus. chicken, 

1 5 .murine and human hedgehog. 

6. The polypeptide of claim 1, wherein the hedgehog fragment is about 30 
to 450 amino acids in length. 

7. - The polypeptide of claim 1 , wherein the hedgehog fragment is about 50 

to 300 aminoacids in length. 

20 8. The polypeptide of claim 1 , wherein the hedgehog fragment is about 75 

to 250 amino acids in length. 
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9. The polypeptide of claim 1 , wherein the hedgehog fragment is about 100 
to 200 amino acids in length. 

10. A method for affecting cholesterol biosynthesis or transport in a cell 
comprising contacting a cell with an effective amount of a compound that 

5 affects hedgehog, thereby affecting cholesterol biosynthesis or transport. 

1 1 . The method of claim 10, wherein the effect is inhibition. 

12. The method of claim 10, wherein the effect is stimulation. 

13. A method for inhibiting the neural inducing activity of a hedgehog 
polypeptide in cells, comprising 

10 introducing into the cells a polypeptide selected from the group consisting 

of: 

a) SEQ ID NO: 17, beginning at amino acid residue 257 to the final 
carboxy amino acid; 

b) SEQ ID NO:l 8, beginning at amino acid residue 257 to the final 
1 5 carboxy amino acid; 

c) SEQ ID NO: 19, beginning at amino acid residue 257 to the final 
carboxy amino acid; and 

d) SEQ ID NO:20, beginning at amino acid residue 257 to the final 
carboxy amino acid, and having at its carboxy terminus, a GlylCys Phe 

20 cleavage site specifically recognized by a proteolytic activity of the 

carboxy terminal fragment of the native hedgehog polypeptide, wherein 
the introduced polypeptide interferes with neural inducing activity of a 
hedgehog polypeptide in the cells, wherein the hedgehog peptide is 
selected from the group consisting of 

25 e) the amino acid sequence of SEQ ID NO: 1 ; 

f) the amino acid sequence of SEQ ID NO:2; 
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g) SEQ ID NO: 17, from amino acid residue 1 to amino acid residue 259; 

h) SEQ ID NO: 18, from amino acid residue 1 to amino acid residue 259; 

i) SEQ ID NO: 19, from amino acid residue 1 to amino acid residue 259; 

and 

5 j) SEQ ID NO.20, from amino acid residue I to amino acid residue 259, 

and having at its carboxy terminus, a Gly i Cys Phe cleavage site 
specifically recognized by a proteolytic activity of the carboxy terminal 
fragment of the native hedgehog polypeptide. 



14. The method of claim 13, wherein the introduced polypeptide induces the 
10 development of dorsal anatomical structures. 

15. The method of claim 13, wherein the introduced polypeptide increases the 
expression of en-2 and krox20. 

1 6. A method of inducing pituitary gland gene expression comprising 
introducing to cells of the pituitary gland, a polypeptide selected from 

1 5 the group consisting of 

a) SEQ ID NO:17, beginning at amino acid residue 257 to the final 
carboxy amino acid; 

b) SEQ ID NO: 18, beginning at amino acid residue 257 to the final 
carboxy amino acid; 

20 c) SEQ ID NO: 1 9, beginning at amino acid residue 257 to the final 

carboxy amino acid; and 

d) SEQ ID NO:20, beginning at amino acid residue 257 to the final 
carboxy amino acid, and having at its carboxy terminus, a Gly 1 Cys Phe 
cleavage site specifically recognized by a proteolytic activity of the 
25 carboxy terminal fragment of the native hedgehog polypeptide, wherein 

the introduced polypeptide induces the expression of pituitary gland- 
specific genes. 
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Hh-C Cleavage Mechanism 




X= DTT.BME.Cysteine, 
Hydoxylamine, or 



Hh-N k^-x 



HS 
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Group I. Claims 1*9, drawn to a substantially pure polypeptide having an amino acid sequence of a hedgehog 
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II each have materially different chemical structures and materially different functional properties. These chemical 
structures and functional properties are the special technical features that identify each invention and distinguish each 
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